Deep Learning CA1¶
Name: Law Wei Tin
Class: DAAA/FT/2A/02
Admin number: 2415761
Background Research¶
Class Characteristics¶
- Bean
General Description:
The images display beans with an elongated and thin shape. Their textures are smooth, and they are generally green.
Potential Challenges:
This class might be similar to Bottle Gourd, or Cucumbers, due to their similar shape. Our model might misclassify the classes.
- Bitter Gourd
General Description:
They are elongated and bumpy with a rough texture. Their colors are green, and resemble beans somewhat.
Potential Challenges:
This bumpy and rough texture might be lost at low resolution. Similarly, the model might mistake this class for a Bean or Cucumber.
- Brinjal
General Description:
Round, oval shape with a smooth and shiny texture. They generally have a purple body and a green head.
Potential Challenges:
The brinjals are easily distinguishable if color is retained. However, it is not due to assignment specifications.
- Cabbage
General Description:
The cabbages have a round and leafy shape. They have a greenish white color, and a layered, leafy texture.
Potential Challenges:
At low resolutions, this class might represent a cauliflower due to their similar shape.
- Capsicum
General Description:
They are generally a bit blocky and glossy. They have a smooth texture, and have a relatively wide variety of color. (red, green, yellow)
Potential Challenges:
Color is not retained due to assignment specifications, making the capsicum class lose one of its most distinguishable properties.
- Cauliflower and Broccoli
General Description:
These two vegetables have a floret-shaped structure, and a granular texture. They are white or green in color.
Potential Challenges:
This class generally doesn't have any downsides, due to their distinct structure. Although it is a question whether their structure can be maintained at low resolutions.
- Cucumber and Bottle_Gourd
General Description:
They are elongated with a smooth texture. Both are green in color.
Potential Challenges:
Similar to Beans, which might pose a problem for our model.
- Potato
General Description:
Potatoes have a circular/oval shape, with a little bit of roughness in their texture. Potatos have a brownish yellow color.
Potential Challenges:
Might be easily confusable with other circular vegetables/fruits, such as Pumpkin or Tomato, especially at low resolutions.
- Pumpkin
General Description:
Pumpkins are large and round, have a ribbed texture, and have a orange body and a green tip.
Potential Challenges:
Easily confusable with Potatos or Tomatos due to structure similarity, especially at low resolutions.
- Radish and Carrot
General Description:
These two have a distinct structure, being tapered and pointy. Having a white/orange/red color, they also have a smooth texture.
Potential Challenges:
This class generally has no downsides, other than their most distinct structure potentially being lost at low resolutions.
- Tomato
General Description:
Tomatoes are round, glossy and smooth. They are also red in color.
Potential Challenges:
Easily confusable with Potatos or Pumpkins due to structure similarity, especially at low resolutions.
Dataset implications:¶
After minor analysis on the dataset, we can observe that certain things are out of place. For example, the class labels for the train, validation and testing are different across folders. We will delve deeper into this later.
Furthermore, closely looking into the folders provided in the train dataset, we can view images in wrong classes. For example, there is approximately 11 carrots in the 'Bean' folder in the train dataset. This can cause implications for our model and incorrect insights. Therefore, we will handle this during our exploratory data analysis.
CNN Architectures:¶
Since this assignment focuses on convolutional neural networks (CNNs), we began by exploring some of the most influential and widely used CNN architectures in the field. Each of the models below brings different strengths to the table, and by implementing and comparing them on our vegetable-fruit dataset, we not only gain a deeper understanding of how architectural choices affect performance, but also build a robust classifier tailored to our problem.
- Custom CNN
Why? Serves as our lightweight, task-specific baseline. By hand-crafting the number and size of convolutional blocks, dropout rates, and classifier head, we can directly observe how each design decision impacts accuracy on small (23x23) grayscale images.
- VGG-Style CNN
Why? Emulates the simplicity and depth of the classic VGG family—stacked 3x3 convolutions with pooling—that's known for very clean feature hierarchies. Although originally designed for 224x224 RGB inputs, a "mini-VGG" adapted to our 23x23 and 101x101 grayscale inputs shows how deeper, uniform layers can improve representational power.
- ResNet-50 (Mini-ResNet Variant)
Why? Introduces residual connections that help gradient flow through very deep networks, preventing vanishing gradients. Even a slimmed-down "mini-ResNet" demonstrates how identity shortcuts let us stack more layers without performance degradation—critical when exploring depth vs. input resolution trade-offs.
- DenseNet-Style CNN
Why? Uses dense connectivity, where each layer receives the outputs of all previous layers. This encourages maximal feature reuse and reduces overall parameter count. On small-input datasets, DenseNets often generalize well, making them a valuable counterpoint to both plain and residual CNNs.
- MobileNet-Lite
Why? Employs depthwise separable convolutions to dramatically cut computation and model size while retaining accuracy. This is especially important for low-resolution inputs or edge-device deployment. Comparing MobileNet-Lite with our heavier models illustrates the trade-off between efficiency and representational capacity.
pip installing necessary libraries¶
visualkeras: Used to display our model architecture in a very visually appealing way.
keras-tuner: Used to hypertune our model.
# pip install visualkeras
# pip install keras-tuner
Importing of libraries¶
# OS module for accessing the images
import os
# TensorFlow for building and training deep learning models
import tensorflow as tf
# Pandas for data manipulation and analysis (e.g., reading CSV files, handling dataframes)
import pandas as pd
# NumPy for numerical operations, especially arrays and matrices
import numpy as np
# Matplotlib for plotting and visualizing data and training metrics
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Random module for generating random numbers, useful for reproducibility or data augmentation
import random
import math
# OpenCV for image processing tasks (e.g., reading, transforming images)
import cv2
# Shutil for file operations such as copying, moving, and deleting files and directories
import shutil
# PIL (Python Imaging Library) for image manipulation like resizing, enhancing, or converting
from PIL import Image, ImageEnhance, ImageOps, ImageFont
# Libraries used for removing duplicates
import hashlib
from collections import defaultdict
# Keras Tuner: tools for hyperparameter tuning using Random Search and HyperParameters
from keras_tuner import HyperParameters, RandomSearch
# Scikit-learn metrics for evaluating model performance: confusion matrix, visualization
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report
# Visualizing Model Architecture
import visualkeras
# Mounting drive to access folders inside
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
Importing of dataset¶
Here we will use a command in order to unzip the dataset folder.
We uploaded a zipped folder because uploading a unzipped folder will take too long. Unzipping in colab takes an approximate 10 seconds.
# Command for unzipping zipped folder
!unzip -q /content/drive/MyDrive/Datasets/image_dataset.zip
dataset_path = "/content/Dataset for CA1 part A - AY2526S1/train"
Exploratory Data Analysis¶
We will begin by conducting an exploratory data analysis of the data, to gain a better understanding of the characteristics of the dataset.
# Show all available classes in train set, and we establish this as the main class folder names
class_names = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))
print(class_names)
['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato']
# Viewing all folder names in each set, to checking if there are any differences (which there are)
train_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))
val_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/validation"))
test_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/test"))
print("Train classes:", train_classes)
print("Validation classes:", val_classes)
print("Test classes:", test_classes)
Train classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato'] Validation classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower with Broccoli', 'Cucumber with Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish with Carrot', 'Tomato'] Test classes: ['Bean', 'Bitter_Gourd', 'Bottle_Gourd and Cucumber', 'Brinjal', 'Broccoli and Cauliflower', 'Cabbage', 'Capsicum (apparently)', 'Carrot and Radish', 'Potato', 'Pumpkin (purportedly)', 'Tomato (ostensibly)']
We notice that some of the names of the folders inside the train class and the validation class is different. For example, the names of 'Cucumber and Bottle_Gourd' inside the train dataset and 'Cucumber with Bottle_Gourd' inside the validation dataset. This can result in wrong model evaluations.
We also observe wrong names in the testing dataset as well.
In order to analyze this issue more in depth, we will print out the images of each set, the train, validation and test.
Data Visualization¶
Firstly, we take a look at batches of the dataset
# Number of images to show
# Grid settings: 4 columns, compute rows as needed
num_classes = 11
ncols = 4
nrows = math.ceil(num_classes / ncols)
Training dataset¶
We will observe the coloured version and the grayscale version
plt.figure(figsize=(ncols * 3, nrows * 3))
for idx, cls in enumerate(class_names):
class_path = os.path.join(dataset_path, cls)
# pick one random image from this class
img_name = random.choice(os.listdir(class_path))
img_path = os.path.join(class_path, img_name)
# read & convert to RGB
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
ax = plt.subplot(nrows, ncols, idx + 1)
ax.imshow(image)
ax.set_title(cls)
ax.axis('off')
# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
plt.subplot(nrows, ncols, j).axis('off')
plt.tight_layout()
plt.show()
Next, we show the grayscaled version of the train set.
plt.figure(figsize=(ncols * 3, nrows * 3))
for idx, cls in enumerate(class_names):
class_path = os.path.join(dataset_path, cls)
# pick one random image from this class
img_name = random.choice(os.listdir(class_path))
img_path = os.path.join(class_path, img_name)
# read & convert to grayscale
image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
ax = plt.subplot(nrows, ncols, idx + 1)
ax.imshow(image, cmap='gray')
ax.set_title(cls)
ax.axis('off')
# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
plt.subplot(nrows, ncols, j).axis('off')
plt.tight_layout()
plt.show()
Validation dataset¶
plt.figure(figsize=(ncols * 3, nrows * 3))
for idx, cls in enumerate(val_classes):
class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/validation", cls)
# pick one random image from this class
img_name = random.choice(os.listdir(class_path))
img_path = os.path.join(class_path, img_name)
# read & convert to RGB
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
ax = plt.subplot(nrows, ncols, idx + 1)
ax.imshow(image)
ax.set_title(cls)
ax.axis('off')
# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
plt.subplot(nrows, ncols, j).axis('off')
plt.tight_layout()
plt.show()
Next, we show the grayscaled version of the validation set.
plt.figure(figsize=(ncols * 3, nrows * 3))
for idx, cls in enumerate(val_classes):
class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/validation", cls)
# pick one random image from this class
img_name = random.choice(os.listdir(class_path))
img_path = os.path.join(class_path, img_name)
# read & convert to RGB
image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
ax = plt.subplot(nrows, ncols, idx + 1)
ax.imshow(image, cmap='gray')
ax.set_title(cls)
ax.axis('off')
# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
plt.subplot(nrows, ncols, j).axis('off')
plt.tight_layout()
plt.show()
Test dataset¶
plt.figure(figsize=(ncols * 3, nrows * 3))
for idx, cls in enumerate(test_classes):
class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/test", cls)
# pick one random image from this class
img_name = random.choice(os.listdir(class_path))
img_path = os.path.join(class_path, img_name)
# read & convert to RGB
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
ax = plt.subplot(nrows, ncols, idx + 1)
ax.imshow(image)
ax.set_title(cls)
ax.axis('off')
# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
plt.subplot(nrows, ncols, j).axis('off')
plt.tight_layout()
plt.show()
We can already observe something wrong here. Tomatos are labeled as pumpkins, and pumpkins are labeled as tomatos. There are also wrong folder names. We will discuss this later. Next, we show the grayscaled version of the test set.
plt.figure(figsize=(ncols * 3, nrows * 3))
for idx, cls in enumerate(test_classes):
class_path = os.path.join("/content/Dataset for CA1 part A - AY2526S1/test", cls)
# pick one random image from this class
img_name = random.choice(os.listdir(class_path))
img_path = os.path.join(class_path, img_name)
# read & convert to grayscale
image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
ax = plt.subplot(nrows, ncols, idx + 1)
ax.imshow(image, cmap='gray')
ax.set_title(cls)
ax.axis('off')
# If there are any empty subplots (e.g. 12th slot), turn off their axes:
for j in range(idx + 2, nrows * ncols + 1):
plt.subplot(nrows, ncols, j).axis('off')
plt.tight_layout()
plt.show()
We notice something terribly wrong in the test set.
For example, the class name: Pumpkin (purportedly) contains images of tomatos. And on the contrary, the class name Tomato (ostensibly) contains images of pumpkins.
We will need to swap the names of these two classes in order to adhere to proper convention.
Another example would be Capsicum (apparently). There shouldn't be an (apparently), because it is infact full of capsicum images.
There are also the issues that we mentioned earlier.
We will need to change this in order to prevent our models from classifying wrongly, and to adhere to proper naming convention.
Correcting errors in folder names¶
We will be using commands in order to rename the folders.
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Pumpkin (purportedly)" /content/'Dataset for CA1 part A - AY2526S1'/test/"Tomato"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Tomato (ostensibly)" /content/'Dataset for CA1 part A - AY2526S1'/test/"Pumpkin"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Capsicum (apparently)" /content/'Dataset for CA1 part A - AY2526S1'/test/"Capsicum"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Bottle_Gourd and Cucumber" /content/'Dataset for CA1 part A - AY2526S1'/test/"Cucumber and Bottle_Gourd"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Broccoli and Cauliflower" /content/'Dataset for CA1 part A - AY2526S1'/test/"Cauliflower and Broccoli"
!mv /content/'Dataset for CA1 part A - AY2526S1'/test/"Carrot and Radish" /content/'Dataset for CA1 part A - AY2526S1'/test/"Radish and Carrot"
!mv /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cucumber with Bottle_Gourd" /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cucumber and Bottle_Gourd"
!mv /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cauliflower with Broccoli" /content/'Dataset for CA1 part A - AY2526S1'/validation/"Cauliflower and Broccoli"
!mv /content/'Dataset for CA1 part A - AY2526S1'/validation/"Radish with Carrot" /content/'Dataset for CA1 part A - AY2526S1'/validation/"Radish and Carrot"
Checking for correct folder names:
train_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))
val_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/validation"))
test_classes = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/test"))
print("Train classes:", train_classes)
print("Validation classes:", val_classes)
print("Test classes:", test_classes)
Train classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato'] Validation classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato'] Test classes: ['Bean', 'Bitter_Gourd', 'Brinjal', 'Cabbage', 'Capsicum', 'Cauliflower and Broccoli', 'Cucumber and Bottle_Gourd', 'Potato', 'Pumpkin', 'Radish and Carrot', 'Tomato']
Addressing the major issues in the train dataset¶
Earlier, we mentioned that we observed that there are Carrots in the Bean dataset. More accurately, there are 11 carrots in the bean dataset, which can mess up our model training.
We want to address this now, in order to prevent any implications caused when addressing stuff like class imbalance.
# Removing carrots in bean folder
# Directory containing the images
bean_dir = '/content/Dataset for CA1 part A - AY2526S1/train/Bean'
# List of filenames (without .jpg extension) to remove
filenames_to_remove = ['0001', '0002', '0003', '0004',
'0017', '0018', '0019', '0020',
'0033', '0049', '0050']
# Remove each specified file
for name in filenames_to_remove:
file_path = os.path.join(bean_dir, f'{name}.jpg')
if os.path.exists(file_path):
os.remove(file_path)
print(f"Deleted: {file_path}")
else:
print(f"File not found: {file_path}")
Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0001.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0002.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0003.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0004.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0017.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0018.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0019.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0020.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0033.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0049.jpg Deleted: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0050.jpg
Viewing Existing Duplicates¶
# Step 1: Helper to compute MD5 hash
def file_hash(filepath):
hasher = hashlib.md5()
with open(filepath, 'rb') as f:
buf = f.read()
hasher.update(buf)
return hasher.hexdigest()
# Step 2: Scan and collect duplicates
root_dir = '/content/Dataset for CA1 part A - AY2526S1/train'
hash_dict = defaultdict(list)
for folder in os.listdir(root_dir):
folder_path = os.path.join(root_dir, folder)
if os.path.isdir(folder_path):
for file in os.listdir(folder_path):
if file.lower().endswith(('.jpg', '.jpeg', '.png')):
path = os.path.join(folder_path, file)
hash_val = file_hash(path)
hash_dict[hash_val].append(path)
# Step 3: Display first duplicate pair (if any)
displayed = False
for files in hash_dict.values():
if len(files) > 1:
img1 = Image.open(files[0])
img2 = Image.open(files[1])
plt.figure(figsize=(15, 4))
plt.subplot(1, 2, 1)
plt.imshow(img1)
plt.title(f'Duplicate 1:\n{files[0]}')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(img2)
plt.title(f'Duplicate 2:\n{files[1]}')
plt.axis('off')
plt.show()
displayed = True
break
if not displayed:
print("No duplicates found.")
# Remove all duplicate files (keep the first file in each group)
duplicates_removed = 0
for files in hash_dict.values():
if len(files) > 1:
for duplicate_path in files[1:]: # skip first, delete the rest
os.remove(duplicate_path)
print(f"Deleted duplicate: {duplicate_path}")
duplicates_removed += 1
print(f"\nTotal duplicates removed: {duplicates_removed}")
Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Tomato/0602.jpg Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0028.jpg Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0026 - Copy.jpg Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0029.jpg Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Bean/0030 - Copy.jpg Deleted duplicate: /content/Dataset for CA1 part A - AY2526S1/train/Cabbage/0438.jpg Total duplicates removed: 6
Observing the difference between the different pixel sizes¶
We will show a picture of the original image, a 23x23 image and a 101x101 image.
We wil display a colored version for easier interpretability.
# Pick a random class
random_class = random.choice(os.listdir(dataset_path))
class_path = os.path.join(dataset_path, random_class)
# Pick a random image from the class
random_image = random.choice(os.listdir(class_path))
image_path = os.path.join(class_path, random_image)
# Read the original image (color)
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Resize to 23x23 and 101x101
small_image = cv2.resize(image_rgb, (23, 23))
large_image = cv2.resize(image_rgb, (101, 101))
# Plot side by side
plt.figure(figsize=(15, 8))
# Original
plt.subplot(1, 3, 1)
plt.imshow(image_rgb)
plt.title('Original')
plt.axis('off')
# 23x23
plt.subplot(1, 3, 2)
plt.imshow(small_image)
plt.title('23x23')
plt.axis('off')
# 101x101
plt.subplot(1, 3, 3)
plt.imshow(large_image)
plt.title('101x101')
plt.axis('off')
plt.tight_layout()
plt.show()
Class Distribution¶
When training a machine learning model, it is always important to check the distribution of the different classes in the dataset. We can check if any class balancing is needed.
If classes are imbalanced, our model might perform well on one class but perform poorly on another class.
class_names = []
n_images_per_class = []
for class_name in os.listdir(dataset_path):
class_path = os.path.join(dataset_path, class_name)
if os.path.isdir(class_path):
n_images = len([f for f in os.listdir(class_path)])
class_names.append(class_name)
n_images_per_class.append(n_images)
print(f"{class_name:15s}: {n_images} images")
Capsicum : 351 images Tomato : 954 images Bitter_Gourd : 720 images Pumpkin : 814 images Bean : 780 images Brinjal : 868 images Cabbage : 502 images Cucumber and Bottle_Gourd: 875 images Radish and Carrot: 504 images Potato : 377 images Cauliflower and Broccoli: 948 images
# Plotting the bar chart
plt.figure(figsize=(10, 6))
plt.barh(class_names, n_images_per_class, color='skyblue')
plt.xlabel('Number of Images')
plt.ylabel('Class Name')
plt.title('Number of Images per Class')
plt.xticks(rotation=45, ha='right')
plt.tight_layout() # To make sure everything fits
plt.show()
We observe that some classes contain significantly more data than others, which may cause our model to perform worse on classes with less data.
To address this, we will add more images to the underrepresented classes, ensuring that each class has the same number of images—954, which is the amount in the class with the most data prior to balancing.
How will we achieve this?¶
To achieve this, we will replicate some images from the dataset and apply augmentation techniques to generate new variations of these images. This will help increase the dataset size while reducing the risk of overfitting by introducing more diversity in the data.
# Paths
balanced_dataset_path = '/content/dataset/balanced_train' # new folder
# Create balanced folder
os.makedirs(balanced_dataset_path, exist_ok=True)
max_count = max(n_images_per_class)
print(f"Largest class has {max_count} images.")
# Simple augmentation function
def augment_image(img_path):
img = Image.open(img_path)
# Random horizontal flip
if random.random() > 0.5:
img = ImageOps.mirror(img)
# Random slight rotation
angle = random.uniform(-15, 15)
img = img.rotate(angle)
# Random brightness adjustment
enhancer = ImageEnhance.Brightness(img)
img = enhancer.enhance(random.uniform(0.8, 1.2))
return img
# For each class
for cls in os.listdir(dataset_path):
cls_folder = os.path.join(dataset_path, cls)
# Ensure it's a directory
if not os.path.isdir(cls_folder):
continue
images = os.listdir(cls_folder)
current_count = len(images)
print(f'{cls}: {current_count}')
# Create class folder in balanced dataset
new_cls_folder = os.path.join(balanced_dataset_path, cls)
os.makedirs(new_cls_folder, exist_ok=True)
# First copy all original images
for img in images:
src_path = os.path.join(cls_folder, img)
dst_path = os.path.join(new_cls_folder, img)
shutil.copyfile(src_path, dst_path)
# Oversample if needed to match max_count
if current_count < max_count:
extra_needed = max_count - current_count
extra_images = random.choices(images, k=extra_needed)
print(f"Augmenting {cls} with {extra_needed} images.")
for idx, img in enumerate(extra_images):
src_path = os.path.join(cls_folder, img)
new_img = augment_image(src_path)
dst_path = os.path.join(new_cls_folder, f"aug_{idx}_{img}")
new_img.save(dst_path)
print("Dataset oversampled and lightly augmented!")
Largest class has 954 images. Capsicum: 351 Augmenting Capsicum with 603 images. Tomato: 954 Bitter_Gourd: 720 Augmenting Bitter_Gourd with 234 images. Pumpkin: 814 Augmenting Pumpkin with 140 images. Bean: 780 Augmenting Bean with 174 images. Brinjal: 868 Augmenting Brinjal with 86 images. Cabbage: 502 Augmenting Cabbage with 452 images. Cucumber and Bottle_Gourd: 875 Augmenting Cucumber and Bottle_Gourd with 79 images. Radish and Carrot: 504 Augmenting Radish and Carrot with 450 images. Potato: 377 Augmenting Potato with 577 images. Cauliflower and Broccoli: 948 Augmenting Cauliflower and Broccoli with 6 images. Dataset oversampled and lightly augmented!
balanced_class_names = []
balanced_n_images_per_class = []
for class_name in os.listdir(balanced_dataset_path):
class_path = os.path.join(balanced_dataset_path, class_name)
if os.path.isdir(class_path):
n_images = len([f for f in os.listdir(class_path)])
balanced_class_names.append(class_name)
balanced_n_images_per_class.append(n_images)
print(f"{class_name:15s}: {n_images} images")
Capsicum : 954 images Tomato : 954 images Bitter_Gourd : 954 images Pumpkin : 954 images Bean : 954 images Brinjal : 954 images Cabbage : 954 images Cucumber and Bottle_Gourd: 954 images Radish and Carrot: 954 images Potato : 954 images Cauliflower and Broccoli: 954 images
Average Image of Grayscaled Dataset¶
(note: showed the average image of the original dataset, not our balanced dataset. therefore, this is the original average image)
# Initialize variables
sum_image = None
count = 0
expected_size = (224, 224) # Height, width
for cls_name in os.listdir(dataset_path):
cls_path = os.path.join(dataset_path, cls_name)
if os.path.isdir(cls_path):
for img_name in os.listdir(cls_path):
img_path = os.path.join(cls_path, img_name)
try:
# Open and convert image to grayscale
img = Image.open(img_path).convert("L") # "L" mode = 8-bit grayscale
# Resize if needed
if img.size != (expected_size[1], expected_size[0]): # PIL uses (width, height)
print(f"Skipping {img_path} due to wrong size: {img.size}")
continue
img_array = np.array(img).astype(np.float32)
if sum_image is None:
sum_image = img_array
else:
sum_image += img_array
count += 1
except Exception as e:
print(f"Error processing {img_path}: {e}")
# Calculate average
average_image = sum_image / count
average_image = np.clip(average_image, 0, 255).astype(np.uint8)
# Display the grayscale average image
plt.figure(figsize=(6, 6))
plt.imshow(average_image, cmap='gray')
plt.title('Average Grayscale Image of Dataset')
plt.axis('off')
plt.show()
Skipping /content/Dataset for CA1 part A - AY2526S1/train/Bitter_Gourd/0526.jpg due to wrong size: (224, 205) Skipping /content/Dataset for CA1 part A - AY2526S1/train/Bitter_Gourd/0609.jpg due to wrong size: (224, 200)
This is the average image of the dataset. Although it is not clear, we can see that the average color is green.
We also notice that there are some images that are of wrong size, and we will handle that during the preprocessing part. Right now, we are just exploring the dataset. This issue can simply be resolved by importing our images by a specific size.
expected_size = (224, 224)
class_averages = {}
# Loop through each class
for cls_name in os.listdir(dataset_path):
cls_path = os.path.join(dataset_path, cls_name)
if os.path.isdir(cls_path):
images = []
for img_name in os.listdir(cls_path):
img_path = os.path.join(cls_path, img_name)
try:
img = Image.open(img_path)
img_array = np.array(img).astype(np.float32)
# Skip images of wrong size
if img_array.shape[0:2] != expected_size:
continue
images.append(img_array)
except Exception as e:
print(f"Error processing {img_path}: {e}")
# Only if there are valid images
if images:
images = np.stack(images, axis=0) # Shape: (num_images, height, width, channels)
avg_image = np.mean(images, axis=0)
# Convert the average image to grayscale
gray_avg_image = np.mean(avg_image, axis=2).astype(np.uint8)
class_averages[cls_name] = gray_avg_image
# Plot the grayscale average image for each class
plt.figure(figsize=(15, 10))
for idx, (cls_name, gray_avg_image) in enumerate(class_averages.items()):
plt.subplot(3, 4, idx + 1) # Adjust depending on how many classes you have
gray_avg_normalized = gray_avg_image / 255.0 # Normalize for visualization
plt.imshow(gray_avg_normalized, cmap='gray') # Display in grayscale using cmap='gray'
plt.title(cls_name)
plt.axis('off')
plt.tight_layout()
plt.show()
From the following, we can observe:
Center Concentration¶
Most classes (e.g., Tomato, Pumpkin, Cabbage, Cauliflower and Broccoli, Potato) have their brightest or darkest intensities at the center, suggesting that the objects are centered in the images across the dataset.
This is typical in datasets where images are preprocessed to center the main object which is good for CNN training.
Circular Shapes¶
Some classes like Potato, Tomato, Bean show dark central blobs with fading outer regions, indicating a rounded, central shape.
This suggests our model could learn to distinguish these classes based on circularity.
Textures¶
Bitter_Gourd, Brinjal, and Radish and Carrot have more textured, noisy, or irregular patterns, possibly due to variation in object shapes or positioning.
These classes may have higher intra-class variation, which might make them harder to classify which can be useful insights for interpreting confusion matrices later.
Data Preprocessing¶
The following code loads both small and large image datasets for training and validation, ensuring that the images are preprocessed into batches, resized, shuffled, and ready for model input.
We also confirm the final shape of the images to verify the resizing process.
small_train = tf.keras.preprocessing.image_dataset_from_directory(
"/content/dataset/balanced_train",
color_mode="grayscale",
batch_size=32,
image_size=(23,23), # strictly specified the sizes of the images
shuffle=True,
seed=123
)
small_val = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Dataset for CA1 part A - AY2526S1/validation",
color_mode="grayscale",
batch_size=32,
image_size=(23,23), # strictly specified the sizes of the images
shuffle=True,
seed=123
)
large_train = tf.keras.preprocessing.image_dataset_from_directory(
"/content/dataset/balanced_train",
color_mode="grayscale",
batch_size=32,
image_size=(101,101), # strictly specified the sizes of the images
shuffle=True,
seed=123
)
large_val = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Dataset for CA1 part A - AY2526S1/validation",
color_mode="grayscale",
batch_size=32,
image_size=(101,101), # strictly specified the sizes of the images
shuffle=True,
seed=123
)
for batch in small_train.take(1):
imgs, labels = batch
print("small:", imgs.shape)
for batch in large_train.take(1):
imgs, labels = batch
print("large:", imgs.shape)
Found 10494 files belonging to 11 classes. Found 2200 files belonging to 11 classes. Found 10494 files belonging to 11 classes. Found 2200 files belonging to 11 classes. small: (32, 23, 23, 1) large: (32, 101, 101, 1)
Normalization¶
In the original images, pixel values typically range from 0 to 255. By normalizing the pixel values by 255.0, we ensure that the input data is standardized, which helps the neural network learn more effectively.
Why Normalize?¶
Neural networks generally perform better and converge faster when input values are within a smaller, consistent range. This process also helps prevent issues with large gradients and makes the training process more stable.
def normalize_img(image, label):
return image / 255.0, label
# Apply normalization
small_train = small_train.map(normalize_img)
small_val = small_val.map(normalize_img)
large_train = large_train.map(normalize_img)
large_val = large_val.map(normalize_img)
Data augmentation¶
We will have 2 sets of each training sets (23x23 and 101x101, and each with augmentation and no augmentation). Our models will be trained on both augmented and non-augmented data, and see if the augmentation actually provides an improvement in performance. If not, we will continue with the original non-augmented sets.
Our augmentation consists of applying flips, rotations, zooms and make the image brighter or darker randomly.
Why data augmentation?¶
Data augmentation can potentially:
Improve generalization. By augmenting our dataset, we are artificially increasing its size and diversity. This means the model will learn to recognize features in many different contexts.
Prevent overfitting. When we have a small dataset, our model can memorize the specific images in the training set. Data augmentation combats overfitting by providing new versions of the same images, so the model sees a broader range of variations and cannot just memorize pixel values.
In summary, data augmentation is a powerful technique to make your model more robust, reduce overfitting, and improve generalization by artificially expanding our dataset with transformations. Therefore we will be trying out augmentation!
data_augmentation = tf.keras.Sequential([
tf.keras.layers.RandomFlip("horizontal"),
tf.keras.layers.RandomRotation(0.1),
tf.keras.layers.RandomZoom(0.1),
tf.keras.layers.RandomContrast(0.1),
])
We will visualize an example of what our augmentation does.
# Function to display original + augmented samples
def visualize_augmentation(dataset, title, img_size):
for images, labels in dataset.take(1):
sample_image = images[0]
break
sample_image_batch = tf.expand_dims(sample_image, 0)
augmented_images = [data_augmentation(sample_image_batch)[0] for _ in range(5)]
plt.figure(figsize=(12, 3))
plt.suptitle(f"{title} - {img_size}x{img_size}", fontsize=14)
plt.subplot(1, 6, 1)
plt.imshow(sample_image.numpy().squeeze(), cmap="gray")
plt.title("Original")
plt.axis("off")
for i, aug_img in enumerate(augmented_images):
plt.subplot(1, 6, i+2)
plt.imshow(aug_img.numpy().squeeze(), cmap="gray")
plt.title(f"Aug {i+1}")
plt.axis("off")
plt.tight_layout()
plt.show()
# Visualize for both datasets
visualize_augmentation(small_train, "Small Images", 23)
visualize_augmentation(large_train, "Large Images", 101)
AUTOTUNE¶
AUTOTUNE optimizes data pipeline performance by dynamically adjusting parallel operations for data loading and augmentation.
It improves the efficiency of our data pipeline by allowing parallel data processing, prefetching, and caching, which can dramatically speed up training.
For tasks involving large datasets and complex transformations (like augmenting images), AUTOTUNE helps avoid bottlenecks and makes better use of our hardware.
AUTOTUNE = tf.data.AUTOTUNE
small_train = small_train.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
small_val = small_val.cache().prefetch(buffer_size=AUTOTUNE)
# Augmented training data
augmented_small_train = (
small_train
.map(lambda x, y: (data_augmentation(x, training=True), y), num_parallel_calls=AUTOTUNE)
.cache()
.prefetch(buffer_size=AUTOTUNE)
)
large_train = large_train.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
large_val = large_val.cache().prefetch(buffer_size=AUTOTUNE)
# Augmented training data
augmented_large_train = (
large_train
.map(lambda x, y: (data_augmentation(x, training=True), y), num_parallel_calls=AUTOTUNE)
.cache()
.prefetch(buffer_size=AUTOTUNE)
)
Model Creation¶
We will first create dictionaries in order to create a dataframe that displays all models.
We will use this dataframe to rank which models are better or worse, and then further hypertune the best models.
Let's discuss what models we are going to build:¶
- Custom CNN
Consists of three convolutional blocks with increasing filter depths.
Uses BatchNormalization after each Conv2D layer to stabilize and accelerate training.
Dropout (configurable) is used after each block for regularization and overfitting control.
Uses GlobalAveragePooling2D instead of flattening to reduce parameter count and encourage generalization.
Ends with a dense classifier (128 units to softmax), suitable for 11-class prediction.
- VGG-inspired CNN
Mimics the VGG-style deep architecture using repeated 3x3 convolutional layers.
Sequential stacking of conv layers per block (32, 64, 128), as per VGG philosophy.
Employs MaxPooling2D after blocks to downsample spatial dimensions.
Uses GlobalAveragePooling2D for compression instead of flattening—more modern.
Final Dense layer: 128 units + Dropout before softmax classifier.
In our "mini-VGG" adaptation we simply reduce the number of pooling layers (or delay them) so we never overcompress the 23x23 or 101x101 input, yet still reap the benefits of VGG's depth and locality bias. We experiment if this method could possibly increase our accuracy.
- Mini-Resnet-inspired Model
Starts with a base convolution followed by custom residual blocks.
Each residual block includes: Two Conv2D layers, BatchNormalization
Skip (identity) connections to avoid vanishing gradients and encourage gradient flow.
Employs MaxPooling2D and GlobalAveragePooling2D to reduce computation.
Final classification layer: Dense(64) + Dropout to softmax.
- Mobilenet-Lite-inspired model
Tailored for efficiency: uses SeparableConv2D (depthwise separable convolutions) to drastically reduce parameter count and computation.
Filter sizes progress from 32, 64, 128 across blocks.
Each block includes: SeparableConv2D, BatchNormalization, MaxPooling or GlobalAveragePooling.
Starts with Rescaling layer to normalize input pixels to [0, 1].
Ends with a compact Dense(64) + Dropout(0.3) to softmax.
- Mini-Densenet-inspired model
Begins with a Conv2D layer (16 filters) followed by two Dense Blocks.
Each Dense Block:
Contains 3 convolutional layers with growth rate = 12. Uses BatchNormalization to ReLU to Conv2D. Applies feature concatenation from all previous layers (key DenseNet trait).
Includes MaxPooling2D (1x2) after first block and GlobalAveragePooling2D at the end.
Final classifier: Dense(64) + Dropout(0.3) to softmax.
A list of potential metrics that we can use:¶
Accuracy
It gives us a quick, overall sense of how our model is right. It is also easy to interpret. The only downside is if there is class imbalance, which we already handled.Precision
A simple way to think is “When the model says this is a carrot, how often is it actually a carrot?”. High precision means few false alarms, which is important if, say, mistaking a toxic flower for an edible vegetable could be dangerous.Recall
A simple way to think is "Of all the real apples, how many did I actually detect?". Higher recall means fewer misses, which is critical if you need to catch every instance of a minority or safety-critical class.F1 Score
This balances precision and recall into a single number. This is useful when we want a trade-off between "false alarms" and "misses" without over-penalizing one.
What metric will we use?¶
Because we want to give a quick and easy, interpretable comparison for our models, we will utilize accuracy. This is because accuracy is the easiest metric to interpret, and the only downside is if there is class imbalance. We already handled class imbalance, hence we do not need to worry about this. In evaluating our best model after hypertuning, we will use all the metrics, so there is no worry.
# Dictionary to store history from each model
small_history_dict = {}
large_history_dict = {}
Model callbacks¶
I used:
Implemented EarlyStopping during model training to monitor validation loss and halt training when no further improvement was observed, thereby preventing overfitting and conserving computational resources. I tuned the parameter of
start_from_epochto 10 because I want to give all models at least 10 epochs of training before comparison.ReduceLROnPlateau to reduce the learning rate when the model's performance plateaus (i.e., when the validation loss stops improving).
It automatically adjusts the learning rate based on the progress, ensuring the model doesn't "get stuck" with a learning rate that is too high when improvements are minimal.
early_stop = tf.keras.callbacks.EarlyStopping(
patience=3,
min_delta=0.0005,
restore_best_weights=True,
monitor='val_loss',
start_from_epoch=10
)
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-6, verbose=1)
Dummy Baseline¶
We will create a dummy model in order to compare how well our models are doing. It has no hidden layers, and it acts like a simple linear classifier.
We can confirm whether our models are actually better than random guessing or naive learning.
def dummy_baseline_model(input_shape=(23, 23, 1), num_classes=11):
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=input_shape),
tf.keras.layers.Flatten(), # Just flatten the image
tf.keras.layers.Dense(num_classes, activation='softmax') # No hidden layers
])
return model
# ------------------------------Small------------------------------
small_dummy_model = dummy_baseline_model()
small_dummy_model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
small_dummy_model.summary()
# ------------------------------Large------------------------------
large_dummy_model = dummy_baseline_model(input_shape=(101, 101, 1))
large_dummy_model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
large_dummy_model.summary()
Model: "sequential_26"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ flatten_19 (Flatten) │ (None, 529) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_31 (Dense) │ (None, 11) │ 5,830 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 5,830 (22.77 KB)
Trainable params: 5,830 (22.77 KB)
Non-trainable params: 0 (0.00 B)
Model: "sequential_27"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ flatten_20 (Flatten) │ (None, 10201) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_32 (Dense) │ (None, 11) │ 112,222 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 112,222 (438.37 KB)
Trainable params: 112,222 (438.37 KB)
Non-trainable params: 0 (0.00 B)
small_dummy_history = small_dummy_model.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_dummy_history = large_dummy_model.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['Dummy Baseline'] = small_dummy_history.history
large_history_dict['Dummy Baseline'] = large_dummy_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1392 - loss: 2.3757 - val_accuracy: 0.2250 - val_loss: 2.2449 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.2396 - loss: 2.1764 - val_accuracy: 0.2168 - val_loss: 2.1897 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.2721 - loss: 2.1165 - val_accuracy: 0.2482 - val_loss: 2.1651 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2949 - loss: 2.0747 - val_accuracy: 0.2791 - val_loss: 2.1421 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2986 - loss: 2.0437 - val_accuracy: 0.2941 - val_loss: 2.1319 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3127 - loss: 2.0193 - val_accuracy: 0.2895 - val_loss: 2.0913 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.3192 - loss: 2.0059 - val_accuracy: 0.2868 - val_loss: 2.0895 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3374 - loss: 1.9705 - val_accuracy: 0.3082 - val_loss: 2.0793 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3397 - loss: 1.9654 - val_accuracy: 0.3114 - val_loss: 2.0671 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.3580 - loss: 1.9420 - val_accuracy: 0.2764 - val_loss: 2.1007 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.3485 - loss: 1.9456 - val_accuracy: 0.3132 - val_loss: 2.0599 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.3571 - loss: 1.9209 - val_accuracy: 0.3118 - val_loss: 2.0679 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3570 - loss: 1.9192 - val_accuracy: 0.3127 - val_loss: 2.0757 - learning_rate: 0.0010 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.3714 - loss: 1.9011 - val_accuracy: 0.3305 - val_loss: 2.0342 - learning_rate: 0.0010 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3696 - loss: 1.9050 - val_accuracy: 0.3164 - val_loss: 2.0473 - learning_rate: 0.0010 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3761 - loss: 1.8915 - val_accuracy: 0.3095 - val_loss: 2.0499 - learning_rate: 0.0010 Epoch 17/20 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.3866 - loss: 1.8708 Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.3865 - loss: 1.8710 - val_accuracy: 0.2955 - val_loss: 2.0731 - learning_rate: 0.0010 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1804 - loss: 2.7404 - val_accuracy: 0.2205 - val_loss: 2.6266 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2689 - loss: 2.3223 - val_accuracy: 0.2859 - val_loss: 2.2324 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2949 - loss: 2.2145 - val_accuracy: 0.2768 - val_loss: 2.2194 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3372 - loss: 2.1115 - val_accuracy: 0.2664 - val_loss: 2.4666 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3487 - loss: 2.1101 - val_accuracy: 0.2645 - val_loss: 2.4757 - learning_rate: 0.0010 Epoch 6/20 318/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.3755 - loss: 1.9811 Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3750 - loss: 1.9837 - val_accuracy: 0.2577 - val_loss: 2.5204 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4250 - loss: 1.7545 - val_accuracy: 0.3018 - val_loss: 2.3362 - learning_rate: 5.0000e-04 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.4429 - loss: 1.7163 - val_accuracy: 0.3223 - val_loss: 2.2842 - learning_rate: 5.0000e-04 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.4475 - loss: 1.7216 - val_accuracy: 0.3277 - val_loss: 2.1727 - learning_rate: 5.0000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.4339 - loss: 1.7262 - val_accuracy: 0.3341 - val_loss: 2.0549 - learning_rate: 5.0000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.4542 - loss: 1.6771 - val_accuracy: 0.2927 - val_loss: 2.2806 - learning_rate: 5.0000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4684 - loss: 1.6245 - val_accuracy: 0.3159 - val_loss: 2.0921 - learning_rate: 5.0000e-04 Epoch 13/20 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.4738 - loss: 1.6302 Epoch 13: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4738 - loss: 1.6304 - val_accuracy: 0.3177 - val_loss: 2.1151 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5246 - loss: 1.4988 - val_accuracy: 0.3391 - val_loss: 2.0769 - learning_rate: 2.5000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.5225 - loss: 1.4936 - val_accuracy: 0.3155 - val_loss: 2.1732 - learning_rate: 2.5000e-04 Epoch 16/20 310/328 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.5293 - loss: 1.4820 Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.5288 - loss: 1.4831 - val_accuracy: 0.3291 - val_loss: 2.1046 - learning_rate: 2.5000e-04 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.5512 - loss: 1.4184 - val_accuracy: 0.3264 - val_loss: 2.0920 - learning_rate: 1.2500e-04
# ------------------------------Small------------------------------
aug_small_dummy_model = dummy_baseline_model()
aug_small_dummy_model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_small_dummy_model.summary()
# ------------------------------Large------------------------------
aug_large_dummy_model = dummy_baseline_model(input_shape=(101, 101, 1))
aug_large_dummy_model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_large_dummy_model.summary()
Model: "sequential_28"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ flatten_21 (Flatten) │ (None, 529) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_33 (Dense) │ (None, 11) │ 5,830 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 5,830 (22.77 KB)
Trainable params: 5,830 (22.77 KB)
Non-trainable params: 0 (0.00 B)
Model: "sequential_29"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ flatten_22 (Flatten) │ (None, 10201) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_34 (Dense) │ (None, 11) │ 112,222 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 112,222 (438.37 KB)
Trainable params: 112,222 (438.37 KB)
Non-trainable params: 0 (0.00 B)
# ------------------------------Small------------------------------
aug_small_dummy_history = aug_small_dummy_model.fit(
augmented_small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['Dummy Baseline with Augmented Data'] = aug_small_dummy_history.history
# ------------------------------Large-----------------------------
aug_large_dummy_history = aug_large_dummy_model.fit(
augmented_large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['Dummy Baseline with Augmented Data'] = aug_large_dummy_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.1344 - loss: 2.3903 - val_accuracy: 0.1632 - val_loss: 2.3076 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.1764 - loss: 2.2907 - val_accuracy: 0.1755 - val_loss: 2.2851 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.1910 - loss: 2.2652 - val_accuracy: 0.1845 - val_loss: 2.2749 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.1998 - loss: 2.2496 - val_accuracy: 0.1914 - val_loss: 2.2691 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2076 - loss: 2.2383 - val_accuracy: 0.1968 - val_loss: 2.2654 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2134 - loss: 2.2292 - val_accuracy: 0.1991 - val_loss: 2.2628 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2169 - loss: 2.2214 - val_accuracy: 0.2009 - val_loss: 2.2609 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2234 - loss: 2.2145 - val_accuracy: 0.1995 - val_loss: 2.2594 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.2261 - loss: 2.2083 - val_accuracy: 0.2027 - val_loss: 2.2582 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.2320 - loss: 2.2026 - val_accuracy: 0.2064 - val_loss: 2.2572 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2345 - loss: 2.1973 - val_accuracy: 0.2073 - val_loss: 2.2564 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2371 - loss: 2.1924 - val_accuracy: 0.2095 - val_loss: 2.2556 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2393 - loss: 2.1877 - val_accuracy: 0.2100 - val_loss: 2.2550 - learning_rate: 0.0010 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2406 - loss: 2.1833 - val_accuracy: 0.2086 - val_loss: 2.2544 - learning_rate: 0.0010 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.2449 - loss: 2.1791 - val_accuracy: 0.2091 - val_loss: 2.2540 - learning_rate: 0.0010 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.2476 - loss: 2.1751 - val_accuracy: 0.2114 - val_loss: 2.2536 - learning_rate: 0.0010 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.2492 - loss: 2.1713 - val_accuracy: 0.2114 - val_loss: 2.2532 - learning_rate: 0.0010 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2510 - loss: 2.1677 - val_accuracy: 0.2118 - val_loss: 2.2529 - learning_rate: 0.0010 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2531 - loss: 2.1642 - val_accuracy: 0.2114 - val_loss: 2.2527 - learning_rate: 0.0010 Epoch 20/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2553 - loss: 2.1609 - val_accuracy: 0.2123 - val_loss: 2.2525 - learning_rate: 0.0010 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.1300 - loss: 3.1514 - val_accuracy: 0.1295 - val_loss: 2.9608 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.1753 - loss: 2.6890 - val_accuracy: 0.1409 - val_loss: 3.0333 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.1910 - loss: 2.6381 - val_accuracy: 0.1455 - val_loss: 3.0445 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.2086 - loss: 2.5955 Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.2086 - loss: 2.5954 - val_accuracy: 0.1514 - val_loss: 3.0352 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.2292 - loss: 2.3724 - val_accuracy: 0.1573 - val_loss: 2.4807 - learning_rate: 5.0000e-04 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2367 - loss: 2.2796 - val_accuracy: 0.1582 - val_loss: 2.4932 - learning_rate: 5.0000e-04 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.2421 - loss: 2.2647 - val_accuracy: 0.1595 - val_loss: 2.5024 - learning_rate: 5.0000e-04 Epoch 8/20 319/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.2487 - loss: 2.2499 Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.2487 - loss: 2.2495 - val_accuracy: 0.1636 - val_loss: 2.5108 - learning_rate: 5.0000e-04 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.2978 - loss: 2.0893 - val_accuracy: 0.1741 - val_loss: 2.3709 - learning_rate: 2.5000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3068 - loss: 2.0652 - val_accuracy: 0.1750 - val_loss: 2.3708 - learning_rate: 2.5000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3068 - loss: 2.0597 - val_accuracy: 0.1727 - val_loss: 2.3711 - learning_rate: 2.5000e-04 Epoch 12/20 327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.3099 - loss: 2.0537 Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3099 - loss: 2.0536 - val_accuracy: 0.1759 - val_loss: 2.3720 - learning_rate: 2.5000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3249 - loss: 1.9859 - val_accuracy: 0.1741 - val_loss: 2.3616 - learning_rate: 1.2500e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3331 - loss: 1.9811 - val_accuracy: 0.1732 - val_loss: 2.3636 - learning_rate: 1.2500e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3347 - loss: 1.9765 - val_accuracy: 0.1736 - val_loss: 2.3656 - learning_rate: 1.2500e-04 Epoch 16/20 320/328 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.3369 - loss: 1.9721 Epoch 16: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.3368 - loss: 1.9720 - val_accuracy: 0.1714 - val_loss: 2.3675 - learning_rate: 1.2500e-04
Our dummy training accuracy is around 0.35, and validation accuracy 0.29, hence we will use this to compare to our other models.
We did a quick comparison on how the dummy model performed on augmented data, and it performed worse, with a training accuracy of 0.23, and a validation accuracy of 0.20.
1. Custom CNN¶
- Custom CNN
Consists of three convolutional blocks with increasing filter depths.
Uses BatchNormalization after each Conv2D layer to stabilize and accelerate training.
Dropout (configurable) is used after each block for regularization and overfitting control.
Uses GlobalAveragePooling2D instead of flattening to reduce parameter count and encourage generalization.
Ends with a dense classifier (128 units to softmax), suitable for 11-class prediction.
def custom_cnn(input_shape=(23, 23, 1), num_classes=11,
dropout_rate=0.5,
filters_block1=32,
filters_block2=64,
dense_units=128):
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=input_shape),
# Block 1
tf.keras.layers.Conv2D(filters_block1, (3, 3), activation='relu', padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters_block1, (3, 3), activation='relu', padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(dropout_rate),
# Block 2 + Pooling
tf.keras.layers.Conv2D(filters_block2, (3, 3), activation='relu', padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters_block2, (3, 3), activation='relu', padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(dropout_rate),
# Block 3
tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(dropout_rate),
# Classifier
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(dense_units, activation='relu'),
tf.keras.layers.Dropout(dropout_rate),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
return model
# ------------------------------Small------------------------------
small_cnn = custom_cnn()
small_cnn.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
small_cnn.summary()
# ------------------------------Large------------------------------
large_cnn = custom_cnn(input_shape=(101, 101, 1))
large_cnn.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
large_cnn.summary()
Model: "sequential_42"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_160 (Conv2D) │ (None, 23, 23, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_124 │ (None, 23, 23, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_161 (Conv2D) │ (None, 23, 23, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_125 │ (None, 23, 23, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_54 (Dropout) │ (None, 23, 23, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_162 (Conv2D) │ (None, 23, 23, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_126 │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_163 (Conv2D) │ (None, 23, 23, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_127 │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_40 (MaxPooling2D) │ (None, 11, 11, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_55 (Dropout) │ (None, 11, 11, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_164 (Conv2D) │ (None, 11, 11, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_128 │ (None, 11, 11, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_165 (Conv2D) │ (None, 11, 11, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_129 │ (None, 11, 11, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_56 (Dropout) │ (None, 11, 11, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_30 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_83 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_57 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_84 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 306,155 (1.17 MB)
Trainable params: 305,259 (1.16 MB)
Non-trainable params: 896 (3.50 KB)
Model: "sequential_43"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_166 (Conv2D) │ (None, 101, 101, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_130 │ (None, 101, 101, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_167 (Conv2D) │ (None, 101, 101, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_131 │ (None, 101, 101, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_58 (Dropout) │ (None, 101, 101, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_168 (Conv2D) │ (None, 101, 101, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_132 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_169 (Conv2D) │ (None, 101, 101, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_133 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_41 (MaxPooling2D) │ (None, 50, 50, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_59 (Dropout) │ (None, 50, 50, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_170 (Conv2D) │ (None, 50, 50, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_134 │ (None, 50, 50, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_171 (Conv2D) │ (None, 50, 50, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_135 │ (None, 50, 50, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_60 (Dropout) │ (None, 50, 50, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_31 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_85 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_61 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_86 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 306,155 (1.17 MB)
Trainable params: 305,259 (1.16 MB)
Non-trainable params: 896 (3.50 KB)
# ------------------------------Small------------------------------
small_cnn_history = small_cnn.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['Custom CNN'] = small_cnn_history.history
# ------------------------------Large------------------------------
large_cnn_history = large_cnn.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['Custom CNN'] = large_cnn_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 26ms/step - accuracy: 0.2545 - loss: 2.1011 - val_accuracy: 0.1382 - val_loss: 6.2379 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.5043 - loss: 1.4389 - val_accuracy: 0.5141 - val_loss: 1.4404 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6197 - loss: 1.1241 - val_accuracy: 0.6441 - val_loss: 1.0432 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6778 - loss: 0.9443 - val_accuracy: 0.6355 - val_loss: 1.2838 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.7387 - loss: 0.7848 - val_accuracy: 0.6841 - val_loss: 0.9091 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7774 - loss: 0.6708 - val_accuracy: 0.8109 - val_loss: 0.5668 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8002 - loss: 0.6115 - val_accuracy: 0.7800 - val_loss: 0.6434 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.8108 - loss: 0.5541 - val_accuracy: 0.8495 - val_loss: 0.4859 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8505 - loss: 0.4592 - val_accuracy: 0.8527 - val_loss: 0.4294 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.8557 - loss: 0.4390 - val_accuracy: 0.8595 - val_loss: 0.4123 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8640 - loss: 0.4179 - val_accuracy: 0.7750 - val_loss: 0.7229 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8845 - loss: 0.3590 - val_accuracy: 0.8882 - val_loss: 0.3677 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.8819 - loss: 0.3482 - val_accuracy: 0.8755 - val_loss: 0.4039 - learning_rate: 0.0010 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.8854 - loss: 0.3517 - val_accuracy: 0.8850 - val_loss: 0.3799 - learning_rate: 0.0010 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.9046 - loss: 0.2838 - val_accuracy: 0.8868 - val_loss: 0.3223 - learning_rate: 0.0010 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 8ms/step - accuracy: 0.9029 - loss: 0.2942 - val_accuracy: 0.8073 - val_loss: 0.7714 - learning_rate: 0.0010 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9146 - loss: 0.2556 - val_accuracy: 0.9105 - val_loss: 0.2972 - learning_rate: 0.0010 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.9149 - loss: 0.2439 - val_accuracy: 0.9086 - val_loss: 0.2623 - learning_rate: 0.0010 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9217 - loss: 0.2417 - val_accuracy: 0.8768 - val_loss: 0.4071 - learning_rate: 0.0010 Epoch 20/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9256 - loss: 0.2263 - val_accuracy: 0.9077 - val_loss: 0.2976 - learning_rate: 0.0010 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 43s 101ms/step - accuracy: 0.3177 - loss: 1.9476 - val_accuracy: 0.0909 - val_loss: 8.0101 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 30s 82ms/step - accuracy: 0.5992 - loss: 1.1908 - val_accuracy: 0.4718 - val_loss: 1.8686 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.7616 - loss: 0.7423 - val_accuracy: 0.3195 - val_loss: 10.0317 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 27s 83ms/step - accuracy: 0.8253 - loss: 0.5542 - val_accuracy: 0.6764 - val_loss: 1.1672 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.8704 - loss: 0.4077 - val_accuracy: 0.7023 - val_loss: 1.2906 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 81ms/step - accuracy: 0.8969 - loss: 0.3320 - val_accuracy: 0.8000 - val_loss: 0.6983 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 81ms/step - accuracy: 0.9131 - loss: 0.2731 - val_accuracy: 0.6836 - val_loss: 1.8680 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9193 - loss: 0.2498 - val_accuracy: 0.9182 - val_loss: 0.2457 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.9349 - loss: 0.2012 - val_accuracy: 0.9014 - val_loss: 0.2852 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 42s 81ms/step - accuracy: 0.9532 - loss: 0.1499 - val_accuracy: 0.6668 - val_loss: 2.6934 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.9452 - loss: 0.1748 Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9452 - loss: 0.1747 - val_accuracy: 0.8609 - val_loss: 0.4725 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 79ms/step - accuracy: 0.9652 - loss: 0.1106 - val_accuracy: 0.9464 - val_loss: 0.1648 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9754 - loss: 0.0858 - val_accuracy: 0.9309 - val_loss: 0.6129 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9788 - loss: 0.0787 - val_accuracy: 0.8882 - val_loss: 0.4159 - learning_rate: 5.0000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 27s 83ms/step - accuracy: 0.9776 - loss: 0.0797 - val_accuracy: 0.9627 - val_loss: 0.1272 - learning_rate: 5.0000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 79ms/step - accuracy: 0.9780 - loss: 0.0729 - val_accuracy: 0.9605 - val_loss: 0.1273 - learning_rate: 5.0000e-04 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.9810 - loss: 0.0674 - val_accuracy: 0.9732 - val_loss: 0.0978 - learning_rate: 5.0000e-04 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 42s 84ms/step - accuracy: 0.9821 - loss: 0.0598 - val_accuracy: 0.8868 - val_loss: 0.8896 - learning_rate: 5.0000e-04 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9788 - loss: 0.0700 - val_accuracy: 0.9327 - val_loss: 0.2111 - learning_rate: 5.0000e-04 Epoch 20/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.9810 - loss: 0.0574 Epoch 20: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 79ms/step - accuracy: 0.9810 - loss: 0.0574 - val_accuracy: 0.9673 - val_loss: 0.1011 - learning_rate: 5.0000e-04
# ------------------------------Small------------------------------
aug_small_cnn = custom_cnn()
aug_small_cnn.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_small_cnn.summary()
# ------------------------------Large------------------------------
aug_large_cnn = custom_cnn(input_shape=(101, 101, 1))
aug_large_cnn.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_large_cnn.summary()
Model: "sequential_32"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_48 (Conv2D) │ (None, 23, 23, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_36 │ (None, 23, 23, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_49 (Conv2D) │ (None, 23, 23, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_37 │ (None, 23, 23, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_26 (Dropout) │ (None, 23, 23, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_50 (Conv2D) │ (None, 23, 23, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_38 │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_51 (Conv2D) │ (None, 23, 23, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_39 │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_10 (MaxPooling2D) │ (None, 11, 11, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_27 (Dropout) │ (None, 11, 11, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_52 (Conv2D) │ (None, 11, 11, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_40 │ (None, 11, 11, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_53 (Conv2D) │ (None, 11, 11, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_41 │ (None, 11, 11, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_28 (Dropout) │ (None, 11, 11, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_8 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_39 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_29 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_40 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 306,155 (1.17 MB)
Trainable params: 305,259 (1.16 MB)
Non-trainable params: 896 (3.50 KB)
Model: "sequential_33"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_54 (Conv2D) │ (None, 101, 101, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_42 │ (None, 101, 101, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_55 (Conv2D) │ (None, 101, 101, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_43 │ (None, 101, 101, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_30 (Dropout) │ (None, 101, 101, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_56 (Conv2D) │ (None, 101, 101, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_44 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_57 (Conv2D) │ (None, 101, 101, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_45 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_11 (MaxPooling2D) │ (None, 50, 50, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_31 (Dropout) │ (None, 50, 50, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_58 (Conv2D) │ (None, 50, 50, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_46 │ (None, 50, 50, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_59 (Conv2D) │ (None, 50, 50, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_47 │ (None, 50, 50, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_32 (Dropout) │ (None, 50, 50, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_9 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_41 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_33 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_42 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 306,155 (1.17 MB)
Trainable params: 305,259 (1.16 MB)
Non-trainable params: 896 (3.50 KB)
# ------------------------------Small------------------------------
aug_small_cnn_history = aug_small_cnn.fit(
augmented_small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['Custom CNN with Augmented Data'] = aug_small_cnn_history.history
# ------------------------------Large------------------------------
aug_large_cnn_history = aug_large_cnn.fit(
augmented_large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['Custom CNN with Augmented Data'] = aug_large_cnn_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 26ms/step - accuracy: 0.2246 - loss: 2.1962 - val_accuracy: 0.0836 - val_loss: 6.3724 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.3892 - loss: 1.7874 - val_accuracy: 0.4377 - val_loss: 1.7344 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.4878 - loss: 1.4998 - val_accuracy: 0.3182 - val_loss: 2.0995 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.5421 - loss: 1.3568 - val_accuracy: 0.3200 - val_loss: 2.6029 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.5839 - loss: 1.2419 - val_accuracy: 0.4868 - val_loss: 1.6488 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6251 - loss: 1.1129 - val_accuracy: 0.5114 - val_loss: 1.5459 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6454 - loss: 1.0462 - val_accuracy: 0.4277 - val_loss: 2.1435 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.6740 - loss: 0.9788 - val_accuracy: 0.4050 - val_loss: 2.3443 - learning_rate: 0.0010 Epoch 9/20 321/328 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.6891 - loss: 0.9172 Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 8ms/step - accuracy: 0.6894 - loss: 0.9165 - val_accuracy: 0.5018 - val_loss: 1.7964 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7273 - loss: 0.8165 - val_accuracy: 0.5491 - val_loss: 1.6418 - learning_rate: 5.0000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7431 - loss: 0.7490 - val_accuracy: 0.5014 - val_loss: 1.9048 - learning_rate: 5.0000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7603 - loss: 0.7094 - val_accuracy: 0.5886 - val_loss: 1.4703 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7647 - loss: 0.7052 - val_accuracy: 0.5423 - val_loss: 1.8088 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7753 - loss: 0.6705 - val_accuracy: 0.5409 - val_loss: 1.9494 - learning_rate: 5.0000e-04 Epoch 15/20 326/328 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7827 - loss: 0.6329 Epoch 15: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7828 - loss: 0.6328 - val_accuracy: 0.5195 - val_loss: 2.0118 - learning_rate: 5.0000e-04 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 44s 101ms/step - accuracy: 0.2955 - loss: 2.0076 - val_accuracy: 0.0950 - val_loss: 4.9160 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 29s 82ms/step - accuracy: 0.5431 - loss: 1.3448 - val_accuracy: 0.4055 - val_loss: 1.7428 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 40s 80ms/step - accuracy: 0.6368 - loss: 1.0598 - val_accuracy: 0.5264 - val_loss: 1.7484 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 27s 82ms/step - accuracy: 0.7187 - loss: 0.8396 - val_accuracy: 0.2568 - val_loss: 10.8765 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.7728 - loss: 0.7064 Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 40s 80ms/step - accuracy: 0.7729 - loss: 0.7063 - val_accuracy: 0.5050 - val_loss: 4.2580 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 28s 85ms/step - accuracy: 0.8302 - loss: 0.5357 - val_accuracy: 0.6341 - val_loss: 1.4416 - learning_rate: 5.0000e-04 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.8590 - loss: 0.4427 - val_accuracy: 0.5877 - val_loss: 1.6160 - learning_rate: 5.0000e-04 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.8706 - loss: 0.3970 - val_accuracy: 0.5382 - val_loss: 2.2017 - learning_rate: 5.0000e-04 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 78ms/step - accuracy: 0.8829 - loss: 0.3618 Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 28s 86ms/step - accuracy: 0.8829 - loss: 0.3618 - val_accuracy: 0.6323 - val_loss: 1.6146 - learning_rate: 5.0000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9127 - loss: 0.2915 - val_accuracy: 0.7473 - val_loss: 0.8932 - learning_rate: 2.5000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 41s 80ms/step - accuracy: 0.9170 - loss: 0.2635 - val_accuracy: 0.7395 - val_loss: 0.9783 - learning_rate: 2.5000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 42s 85ms/step - accuracy: 0.9211 - loss: 0.2479 - val_accuracy: 0.7559 - val_loss: 0.8543 - learning_rate: 2.5000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9231 - loss: 0.2409 - val_accuracy: 0.7414 - val_loss: 0.9867 - learning_rate: 2.5000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9297 - loss: 0.2240 - val_accuracy: 0.7495 - val_loss: 0.9265 - learning_rate: 2.5000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 76ms/step - accuracy: 0.9326 - loss: 0.2111 Epoch 15: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 26s 80ms/step - accuracy: 0.9327 - loss: 0.2111 - val_accuracy: 0.7341 - val_loss: 1.0314 - learning_rate: 2.5000e-04
We noticed that for our custom CNN, the model performed worse on the augmented dataset.
2. VGG-inspired CNN¶
- VGG-inspired CNN
Mimics the VGG-style deep architecture using repeated 3x3 convolutional layers.
Sequential stacking of conv layers per block (32, 64, 128), as per VGG philosophy.
Employs MaxPooling2D after blocks to downsample spatial dimensions.
Uses GlobalAveragePooling2D for compression instead of flattening—more modern.
Final Dense layer: 128 units + Dropout before softmax classifier.
def vgg_model(input_shape=(23, 23, 1), num_classes=11):
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=input_shape),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
tf.keras.layers.MaxPooling2D(pool_size=(1, 2)),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
return model
# ------------------------------Small------------------------------
small_vgg = vgg_model()
small_vgg.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
small_vgg.summary()
# ------------------------------Large------------------------------
large_vgg = vgg_model(input_shape=(101, 101, 1))
large_vgg.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
large_vgg.summary()
Model: "sequential_44"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_172 (Conv2D) │ (None, 23, 23, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_173 (Conv2D) │ (None, 23, 23, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_42 (MaxPooling2D) │ (None, 23, 11, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_174 (Conv2D) │ (None, 23, 11, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_175 (Conv2D) │ (None, 23, 11, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_43 (MaxPooling2D) │ (None, 11, 5, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_176 (Conv2D) │ (None, 11, 5, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_177 (Conv2D) │ (None, 11, 5, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_32 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_87 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_62 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_88 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 304,363 (1.16 MB)
Trainable params: 304,363 (1.16 MB)
Non-trainable params: 0 (0.00 B)
Model: "sequential_45"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_178 (Conv2D) │ (None, 101, 101, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_179 (Conv2D) │ (None, 101, 101, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_44 (MaxPooling2D) │ (None, 101, 50, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_180 (Conv2D) │ (None, 101, 50, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_181 (Conv2D) │ (None, 101, 50, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_45 (MaxPooling2D) │ (None, 50, 25, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_182 (Conv2D) │ (None, 50, 25, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_183 (Conv2D) │ (None, 50, 25, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_33 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_89 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_63 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_90 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 304,363 (1.16 MB)
Trainable params: 304,363 (1.16 MB)
Non-trainable params: 0 (0.00 B)
# ------------------------------Small------------------------------
small_vgg_history = small_vgg.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['VGG'] = small_vgg_history.history
# ------------------------------Large------------------------------
large_vgg_history = large_vgg.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['VGG'] = large_vgg_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 8s 12ms/step - accuracy: 0.0937 - loss: 2.3984 - val_accuracy: 0.1082 - val_loss: 2.3728 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1666 - loss: 2.2724 - val_accuracy: 0.2264 - val_loss: 2.1362 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.2917 - loss: 1.9686 - val_accuracy: 0.3050 - val_loss: 1.9496 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3639 - loss: 1.8303 - val_accuracy: 0.4400 - val_loss: 1.6719 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.4139 - loss: 1.6880 - val_accuracy: 0.4409 - val_loss: 1.6270 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4706 - loss: 1.5812 - val_accuracy: 0.4509 - val_loss: 1.5683 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.5217 - loss: 1.4351 - val_accuracy: 0.5718 - val_loss: 1.2660 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.5681 - loss: 1.2985 - val_accuracy: 0.5982 - val_loss: 1.2093 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6320 - loss: 1.1293 - val_accuracy: 0.6655 - val_loss: 0.9910 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.6901 - loss: 0.9493 - val_accuracy: 0.6973 - val_loss: 0.8918 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7267 - loss: 0.8203 - val_accuracy: 0.7309 - val_loss: 0.7952 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7566 - loss: 0.7246 - val_accuracy: 0.7900 - val_loss: 0.6333 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7943 - loss: 0.6244 - val_accuracy: 0.7927 - val_loss: 0.6316 - learning_rate: 0.0010 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8336 - loss: 0.5111 - val_accuracy: 0.7777 - val_loss: 0.6660 - learning_rate: 0.0010 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8473 - loss: 0.4631 - val_accuracy: 0.8182 - val_loss: 0.5632 - learning_rate: 0.0010 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8662 - loss: 0.4065 - val_accuracy: 0.8023 - val_loss: 0.6385 - learning_rate: 0.0010 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8835 - loss: 0.3613 - val_accuracy: 0.8368 - val_loss: 0.5407 - learning_rate: 0.0010 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8926 - loss: 0.3304 - val_accuracy: 0.8550 - val_loss: 0.4841 - learning_rate: 0.0010 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9146 - loss: 0.2622 - val_accuracy: 0.8518 - val_loss: 0.4823 - learning_rate: 0.0010 Epoch 20/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9162 - loss: 0.2543 - val_accuracy: 0.8441 - val_loss: 0.4845 - learning_rate: 0.0010 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 51ms/step - accuracy: 0.1026 - loss: 2.3887 - val_accuracy: 0.1782 - val_loss: 2.2549 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.1848 - loss: 2.2529 - val_accuracy: 0.2182 - val_loss: 2.2171 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.2265 - loss: 2.2034 - val_accuracy: 0.2200 - val_loss: 2.1850 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.2882 - loss: 2.0776 - val_accuracy: 0.3200 - val_loss: 1.9823 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.3492 - loss: 1.9127 - val_accuracy: 0.3418 - val_loss: 1.9219 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 43ms/step - accuracy: 0.3804 - loss: 1.7930 - val_accuracy: 0.4241 - val_loss: 1.7235 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.4301 - loss: 1.6564 - val_accuracy: 0.4918 - val_loss: 1.5147 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.4712 - loss: 1.5569 - val_accuracy: 0.5673 - val_loss: 1.3011 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.5408 - loss: 1.3848 - val_accuracy: 0.6782 - val_loss: 0.9924 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.5887 - loss: 1.2029 - val_accuracy: 0.7118 - val_loss: 0.8880 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.6683 - loss: 0.9954 - val_accuracy: 0.7409 - val_loss: 0.7648 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.6972 - loss: 0.8884 - val_accuracy: 0.8068 - val_loss: 0.6020 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 44ms/step - accuracy: 0.7432 - loss: 0.7655 - val_accuracy: 0.8041 - val_loss: 0.5961 - learning_rate: 0.0010 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.7681 - loss: 0.6929 - val_accuracy: 0.8205 - val_loss: 0.5339 - learning_rate: 0.0010 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.8022 - loss: 0.6126 - val_accuracy: 0.8700 - val_loss: 0.4004 - learning_rate: 0.0010 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.8183 - loss: 0.5558 - val_accuracy: 0.8845 - val_loss: 0.3864 - learning_rate: 0.0010 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 42ms/step - accuracy: 0.8341 - loss: 0.5117 - val_accuracy: 0.9068 - val_loss: 0.3174 - learning_rate: 0.0010 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 44ms/step - accuracy: 0.8509 - loss: 0.4545 - val_accuracy: 0.8941 - val_loss: 0.3435 - learning_rate: 0.0010 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 44ms/step - accuracy: 0.8577 - loss: 0.4328 - val_accuracy: 0.9145 - val_loss: 0.2938 - learning_rate: 0.0010 Epoch 20/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.8656 - loss: 0.4118 - val_accuracy: 0.9227 - val_loss: 0.2655 - learning_rate: 0.0010
# ------------------------------Small------------------------------
aug_small_vgg = vgg_model()
aug_small_vgg.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_small_vgg.summary()
# ------------------------------Large------------------------------
aug_large_vgg = vgg_model(input_shape=(101, 101, 1))
aug_large_vgg.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_large_vgg.summary()
Model: "sequential_46"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_184 (Conv2D) │ (None, 23, 23, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_185 (Conv2D) │ (None, 23, 23, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_46 (MaxPooling2D) │ (None, 23, 11, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_186 (Conv2D) │ (None, 23, 11, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_187 (Conv2D) │ (None, 23, 11, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_47 (MaxPooling2D) │ (None, 11, 5, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_188 (Conv2D) │ (None, 11, 5, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_189 (Conv2D) │ (None, 11, 5, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_34 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_91 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_64 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_92 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 304,363 (1.16 MB)
Trainable params: 304,363 (1.16 MB)
Non-trainable params: 0 (0.00 B)
Model: "sequential_47"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_190 (Conv2D) │ (None, 101, 101, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_191 (Conv2D) │ (None, 101, 101, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_48 (MaxPooling2D) │ (None, 101, 50, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_192 (Conv2D) │ (None, 101, 50, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_193 (Conv2D) │ (None, 101, 50, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_49 (MaxPooling2D) │ (None, 50, 25, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_194 (Conv2D) │ (None, 50, 25, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_195 (Conv2D) │ (None, 50, 25, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_35 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_93 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_65 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_94 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 304,363 (1.16 MB)
Trainable params: 304,363 (1.16 MB)
Non-trainable params: 0 (0.00 B)
# ------------------------------Small------------------------------
aug_small_vgg_history = aug_small_vgg.fit(
augmented_small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['VGG with Augmented Data'] = aug_small_vgg_history.history
# ------------------------------Large------------------------------
aug_large_vgg_history = aug_large_vgg.fit(
augmented_large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['VGG with Augmented Data'] = aug_large_vgg_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 7s 12ms/step - accuracy: 0.0902 - loss: 2.3988 - val_accuracy: 0.0818 - val_loss: 2.3971 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.1367 - loss: 2.3392 - val_accuracy: 0.1791 - val_loss: 2.2224 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.1780 - loss: 2.2512 - val_accuracy: 0.2095 - val_loss: 2.1647 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.2134 - loss: 2.2164 - val_accuracy: 0.2986 - val_loss: 2.0657 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2606 - loss: 2.1261 - val_accuracy: 0.3214 - val_loss: 1.9577 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.2844 - loss: 2.0557 - val_accuracy: 0.3109 - val_loss: 2.0132 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3036 - loss: 1.9854 - val_accuracy: 0.2777 - val_loss: 2.1146 - learning_rate: 0.0010 Epoch 8/20 324/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.3291 - loss: 1.9255 Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.3293 - loss: 1.9250 - val_accuracy: 0.3177 - val_loss: 2.0561 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3726 - loss: 1.8224 - val_accuracy: 0.3655 - val_loss: 1.9080 - learning_rate: 5.0000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.3998 - loss: 1.7446 - val_accuracy: 0.3727 - val_loss: 1.9407 - learning_rate: 5.0000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4274 - loss: 1.6660 - val_accuracy: 0.3677 - val_loss: 2.0072 - learning_rate: 5.0000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.4577 - loss: 1.6016 - val_accuracy: 0.4168 - val_loss: 1.8113 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4835 - loss: 1.5260 - val_accuracy: 0.4277 - val_loss: 1.7844 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.5113 - loss: 1.4596 - val_accuracy: 0.4341 - val_loss: 1.7768 - learning_rate: 5.0000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5361 - loss: 1.3858 - val_accuracy: 0.4705 - val_loss: 1.7161 - learning_rate: 5.0000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5478 - loss: 1.3271 - val_accuracy: 0.4700 - val_loss: 1.7418 - learning_rate: 5.0000e-04 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5748 - loss: 1.2672 - val_accuracy: 0.5136 - val_loss: 1.5474 - learning_rate: 5.0000e-04 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.5905 - loss: 1.2128 - val_accuracy: 0.5164 - val_loss: 1.5563 - learning_rate: 5.0000e-04 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6147 - loss: 1.1438 - val_accuracy: 0.4586 - val_loss: 1.9316 - learning_rate: 5.0000e-04 Epoch 20/20 317/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.6301 - loss: 1.0988 Epoch 20: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6304 - loss: 1.0982 - val_accuracy: 0.5132 - val_loss: 1.6573 - learning_rate: 5.0000e-04 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 50ms/step - accuracy: 0.1012 - loss: 2.3922 - val_accuracy: 0.1623 - val_loss: 2.3155 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 42ms/step - accuracy: 0.1653 - loss: 2.2837 - val_accuracy: 0.1909 - val_loss: 2.2069 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 42ms/step - accuracy: 0.2239 - loss: 2.1836 - val_accuracy: 0.2695 - val_loss: 2.0757 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.2575 - loss: 2.0939 - val_accuracy: 0.3109 - val_loss: 2.0094 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 44ms/step - accuracy: 0.2909 - loss: 2.0366 - val_accuracy: 0.3641 - val_loss: 1.8792 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 44ms/step - accuracy: 0.3298 - loss: 1.9410 - val_accuracy: 0.4159 - val_loss: 1.6898 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.4020 - loss: 1.7414 - val_accuracy: 0.5055 - val_loss: 1.4599 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 43ms/step - accuracy: 0.4576 - loss: 1.5496 - val_accuracy: 0.5486 - val_loss: 1.2989 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 43ms/step - accuracy: 0.5100 - loss: 1.4018 - val_accuracy: 0.5927 - val_loss: 1.1631 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 43ms/step - accuracy: 0.5409 - loss: 1.3181 - val_accuracy: 0.6114 - val_loss: 1.1246 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.5745 - loss: 1.2204 - val_accuracy: 0.6705 - val_loss: 0.9714 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.5964 - loss: 1.1551 - val_accuracy: 0.6568 - val_loss: 1.0049 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.6231 - loss: 1.0560 - val_accuracy: 0.6868 - val_loss: 0.8748 - learning_rate: 0.0010 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 42ms/step - accuracy: 0.6443 - loss: 1.0096 - val_accuracy: 0.6727 - val_loss: 0.9522 - learning_rate: 0.0010 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 44ms/step - accuracy: 0.6679 - loss: 0.9486 - val_accuracy: 0.6695 - val_loss: 1.0153 - learning_rate: 0.0010 Epoch 16/20 327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step - accuracy: 0.6970 - loss: 0.8981 Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 43ms/step - accuracy: 0.6970 - loss: 0.8980 - val_accuracy: 0.6936 - val_loss: 0.9360 - learning_rate: 0.0010
Again, we noticed that the model performed significantly worse when feeded with augmented data.
3. Mini-Resnet-inspired Model¶
- Mini-Resnet-inspired Model
Starts with a base convolution followed by custom residual blocks.
Each residual block includes: Two Conv2D layers, BatchNormalization
Skip (identity) connections to avoid vanishing gradients and encourage gradient flow.
Employs MaxPooling2D and GlobalAveragePooling2D to reduce computation.
Final classification layer: Dense(64) + Dropout to softmax.
def residual_block(x, filters):
shortcut = x
x = tf.keras.layers.Conv2D(filters, (3, 3), padding='same', activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(filters, (3, 3), padding='same')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Add()([x, shortcut])
x = tf.keras.layers.Activation('relu')(x)
return x
def mini_resnet(input_shape=(23, 23, 1), num_classes=11):
inputs = tf.keras.Input(shape=input_shape)
x = tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu')(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = residual_block(x, 32)
x = tf.keras.layers.MaxPooling2D(pool_size=(1, 2))(x)
x = tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
x = residual_block(x, 64)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
x = tf.keras.layers.Dropout(0.5)(x)
outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
return tf.keras.Model(inputs, outputs)
# ------------------------------Small------------------------------
small_resnet_model = mini_resnet()
small_resnet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
small_resnet_model.summary()
# ------------------------------Large------------------------------
large_resnet_model = mini_resnet(input_shape=(101, 101, 1))
large_resnet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
large_resnet_model.summary()
Model: "functional_50"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_50 │ (None, 23, 23, 1) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_136 (Conv2D) │ (None, 23, 23, │ 320 │ input_layer_50[0… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 128 │ conv2d_136[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_137 (Conv2D) │ (None, 23, 23, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 128 │ conv2d_137[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_138 (Conv2D) │ (None, 23, 23, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 128 │ conv2d_138[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_8 (Add) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 32) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_8 │ (None, 23, 23, │ 0 │ add_8[0][0] │ │ (Activation) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_36 │ (None, 23, 11, │ 0 │ activation_8[0][… │ │ (MaxPooling2D) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_139 (Conv2D) │ (None, 23, 11, │ 18,496 │ max_pooling2d_36… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_140 (Conv2D) │ (None, 23, 11, │ 36,928 │ conv2d_139[0][0] │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 256 │ conv2d_140[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_141 (Conv2D) │ (None, 23, 11, │ 36,928 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 256 │ conv2d_141[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_9 (Add) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 64) │ │ conv2d_139[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_9 │ (None, 23, 11, │ 0 │ add_9[0][0] │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 64) │ 0 │ activation_9[0][… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_75 (Dense) │ (None, 64) │ 4,160 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_50 │ (None, 64) │ 0 │ dense_75[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_76 (Dense) │ (None, 11) │ 715 │ dropout_50[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 116,939 (456.79 KB)
Trainable params: 116,491 (455.04 KB)
Non-trainable params: 448 (1.75 KB)
Model: "functional_51"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_51 │ (None, 101, 101, │ 0 │ - │ │ (InputLayer) │ 1) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_142 (Conv2D) │ (None, 101, 101, │ 320 │ input_layer_51[0… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 128 │ conv2d_142[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_143 (Conv2D) │ (None, 101, 101, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 128 │ conv2d_143[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_144 (Conv2D) │ (None, 101, 101, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 128 │ conv2d_144[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_10 (Add) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 32) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_10 │ (None, 101, 101, │ 0 │ add_10[0][0] │ │ (Activation) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_37 │ (None, 101, 50, │ 0 │ activation_10[0]… │ │ (MaxPooling2D) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_145 (Conv2D) │ (None, 101, 50, │ 18,496 │ max_pooling2d_37… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_146 (Conv2D) │ (None, 101, 50, │ 36,928 │ conv2d_145[0][0] │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 256 │ conv2d_146[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_147 (Conv2D) │ (None, 101, 50, │ 36,928 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 256 │ conv2d_147[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_11 (Add) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 64) │ │ conv2d_145[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_11 │ (None, 101, 50, │ 0 │ add_11[0][0] │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 64) │ 0 │ activation_11[0]… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_77 (Dense) │ (None, 64) │ 4,160 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_51 │ (None, 64) │ 0 │ dense_77[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_78 (Dense) │ (None, 11) │ 715 │ dropout_51[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 116,939 (456.79 KB)
Trainable params: 116,491 (455.04 KB)
Non-trainable params: 448 (1.75 KB)
# ------------------------------Small------------------------------
small_resnet_history = small_resnet_model.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['ResNet50'] = small_resnet_history.history
# ------------------------------Large------------------------------
large_resnet_history = large_resnet_model.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['ResNet50'] = large_resnet_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 14s 17ms/step - accuracy: 0.2208 - loss: 2.2043 - val_accuracy: 0.0909 - val_loss: 4.3744 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.4300 - loss: 1.6250 - val_accuracy: 0.2655 - val_loss: 2.8556 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5792 - loss: 1.2778 - val_accuracy: 0.5591 - val_loss: 1.2504 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6611 - loss: 1.0438 - val_accuracy: 0.4414 - val_loss: 1.8652 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7213 - loss: 0.8762 - val_accuracy: 0.6573 - val_loss: 1.0443 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7715 - loss: 0.7279 - val_accuracy: 0.6305 - val_loss: 1.0817 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7905 - loss: 0.6701 - val_accuracy: 0.6527 - val_loss: 1.3953 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8194 - loss: 0.5846 - val_accuracy: 0.6709 - val_loss: 1.0047 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8498 - loss: 0.4996 - val_accuracy: 0.7418 - val_loss: 0.9050 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8611 - loss: 0.4382 - val_accuracy: 0.6764 - val_loss: 0.9837 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8826 - loss: 0.3791 - val_accuracy: 0.7605 - val_loss: 0.8136 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8874 - loss: 0.3647 - val_accuracy: 0.7709 - val_loss: 0.7217 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9061 - loss: 0.3135 - val_accuracy: 0.8255 - val_loss: 0.5675 - learning_rate: 0.0010 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9169 - loss: 0.2681 - val_accuracy: 0.8495 - val_loss: 0.5075 - learning_rate: 0.0010 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9249 - loss: 0.2468 - val_accuracy: 0.7000 - val_loss: 1.2336 - learning_rate: 0.0010 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9223 - loss: 0.2415 - val_accuracy: 0.8409 - val_loss: 0.5300 - learning_rate: 0.0010 Epoch 17/20 322/328 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9406 - loss: 0.2046 Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9405 - loss: 0.2048 - val_accuracy: 0.7345 - val_loss: 1.2525 - learning_rate: 0.0010 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 30s 69ms/step - accuracy: 0.2482 - loss: 2.1284 - val_accuracy: 0.0909 - val_loss: 4.9257 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 34s 59ms/step - accuracy: 0.4549 - loss: 1.5550 - val_accuracy: 0.1555 - val_loss: 3.9594 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 59ms/step - accuracy: 0.5697 - loss: 1.2446 - val_accuracy: 0.4791 - val_loss: 1.4131 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.6737 - loss: 0.9698 - val_accuracy: 0.6436 - val_loss: 1.1558 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 57ms/step - accuracy: 0.7348 - loss: 0.8067 - val_accuracy: 0.6427 - val_loss: 0.9848 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.7861 - loss: 0.6780 - val_accuracy: 0.7323 - val_loss: 0.9397 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 58ms/step - accuracy: 0.8310 - loss: 0.5419 - val_accuracy: 0.5468 - val_loss: 1.5731 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8548 - loss: 0.4666 - val_accuracy: 0.5991 - val_loss: 1.4657 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.8888 - loss: 0.3764 Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8888 - loss: 0.3764 - val_accuracy: 0.4982 - val_loss: 2.2022 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9161 - loss: 0.2873 - val_accuracy: 0.8527 - val_loss: 0.4331 - learning_rate: 5.0000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 57ms/step - accuracy: 0.9276 - loss: 0.2569 - val_accuracy: 0.9450 - val_loss: 0.1915 - learning_rate: 5.0000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9284 - loss: 0.2361 - val_accuracy: 0.8805 - val_loss: 0.4103 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9352 - loss: 0.2188 - val_accuracy: 0.9564 - val_loss: 0.1487 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 59ms/step - accuracy: 0.9464 - loss: 0.1962 - val_accuracy: 0.7986 - val_loss: 0.6648 - learning_rate: 5.0000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.9512 - loss: 0.1854 - val_accuracy: 0.8955 - val_loss: 0.3156 - learning_rate: 5.0000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.9439 - loss: 0.1877 Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9439 - loss: 0.1877 - val_accuracy: 0.9318 - val_loss: 0.2156 - learning_rate: 5.0000e-04
# ------------------------------Small------------------------------
aug_small_resnet_model = mini_resnet()
aug_small_resnet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_small_resnet_model.summary()
# ------------------------------Large------------------------------
aug_large_resnet_model = mini_resnet(input_shape=(101, 101, 1))
aug_large_resnet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_large_resnet_model.summary()
Model: "functional_52"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_52 │ (None, 23, 23, 1) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_148 (Conv2D) │ (None, 23, 23, │ 320 │ input_layer_52[0… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 128 │ conv2d_148[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_149 (Conv2D) │ (None, 23, 23, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 128 │ conv2d_149[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_150 (Conv2D) │ (None, 23, 23, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 128 │ conv2d_150[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_12 (Add) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 32) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_12 │ (None, 23, 23, │ 0 │ add_12[0][0] │ │ (Activation) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_38 │ (None, 23, 11, │ 0 │ activation_12[0]… │ │ (MaxPooling2D) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_151 (Conv2D) │ (None, 23, 11, │ 18,496 │ max_pooling2d_38… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_152 (Conv2D) │ (None, 23, 11, │ 36,928 │ conv2d_151[0][0] │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 256 │ conv2d_152[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_153 (Conv2D) │ (None, 23, 11, │ 36,928 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 256 │ conv2d_153[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_13 (Add) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 64) │ │ conv2d_151[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_13 │ (None, 23, 11, │ 0 │ add_13[0][0] │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 64) │ 0 │ activation_13[0]… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_79 (Dense) │ (None, 64) │ 4,160 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_52 │ (None, 64) │ 0 │ dense_79[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_80 (Dense) │ (None, 11) │ 715 │ dropout_52[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 116,939 (456.79 KB)
Trainable params: 116,491 (455.04 KB)
Non-trainable params: 448 (1.75 KB)
Model: "functional_53"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_53 │ (None, 101, 101, │ 0 │ - │ │ (InputLayer) │ 1) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_154 (Conv2D) │ (None, 101, 101, │ 320 │ input_layer_53[0… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 128 │ conv2d_154[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_155 (Conv2D) │ (None, 101, 101, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 128 │ conv2d_155[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_156 (Conv2D) │ (None, 101, 101, │ 9,248 │ batch_normalizat… │ │ │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 128 │ conv2d_156[0][0] │ │ (BatchNormalizatio… │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_14 (Add) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 32) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_14 │ (None, 101, 101, │ 0 │ add_14[0][0] │ │ (Activation) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_39 │ (None, 101, 50, │ 0 │ activation_14[0]… │ │ (MaxPooling2D) │ 32) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_157 (Conv2D) │ (None, 101, 50, │ 18,496 │ max_pooling2d_39… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_158 (Conv2D) │ (None, 101, 50, │ 36,928 │ conv2d_157[0][0] │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 256 │ conv2d_158[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_159 (Conv2D) │ (None, 101, 50, │ 36,928 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 256 │ conv2d_159[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_15 (Add) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 64) │ │ conv2d_157[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_15 │ (None, 101, 50, │ 0 │ add_15[0][0] │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 64) │ 0 │ activation_15[0]… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_81 (Dense) │ (None, 64) │ 4,160 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_53 │ (None, 64) │ 0 │ dense_81[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_82 (Dense) │ (None, 11) │ 715 │ dropout_53[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 116,939 (456.79 KB)
Trainable params: 116,491 (455.04 KB)
Non-trainable params: 448 (1.75 KB)
# ------------------------------Small------------------------------
aug_small_resnet_history = aug_small_resnet_model.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['ResNet50 with Augmented Data'] = aug_small_resnet_history.history # Save training history
# ------------------------------Large------------------------------
aug_large_resnet_history = aug_large_resnet_model.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['ResNet50 with Augmented Data'] = aug_large_resnet_history.history # Save training history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 21ms/step - accuracy: 0.2185 - loss: 2.1924 - val_accuracy: 0.0909 - val_loss: 12.4494 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4811 - loss: 1.5481 - val_accuracy: 0.3559 - val_loss: 1.8525 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6027 - loss: 1.1984 - val_accuracy: 0.5005 - val_loss: 1.6080 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6699 - loss: 0.9985 - val_accuracy: 0.4464 - val_loss: 1.7087 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7331 - loss: 0.8312 - val_accuracy: 0.5895 - val_loss: 1.1781 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7813 - loss: 0.6913 - val_accuracy: 0.6277 - val_loss: 1.3663 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.8091 - loss: 0.5971 - val_accuracy: 0.8041 - val_loss: 0.5582 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8308 - loss: 0.5490 - val_accuracy: 0.7645 - val_loss: 0.7376 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8588 - loss: 0.4605 - val_accuracy: 0.7586 - val_loss: 0.7394 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8559 - loss: 0.4432 Epoch 10: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8559 - loss: 0.4432 - val_accuracy: 0.7727 - val_loss: 0.7658 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9135 - loss: 0.2859 - val_accuracy: 0.8259 - val_loss: 0.5543 - learning_rate: 5.0000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9357 - loss: 0.2306 - val_accuracy: 0.8818 - val_loss: 0.3875 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9387 - loss: 0.2088 - val_accuracy: 0.8850 - val_loss: 0.3903 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9435 - loss: 0.1888 - val_accuracy: 0.8391 - val_loss: 0.6033 - learning_rate: 5.0000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9531 - loss: 0.1693 - val_accuracy: 0.8950 - val_loss: 0.3598 - learning_rate: 5.0000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9580 - loss: 0.1502 - val_accuracy: 0.8645 - val_loss: 0.4930 - learning_rate: 5.0000e-04 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9585 - loss: 0.1432 - val_accuracy: 0.8805 - val_loss: 0.4434 - learning_rate: 5.0000e-04 Epoch 18/20 321/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9610 - loss: 0.1354 Epoch 18: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9609 - loss: 0.1356 - val_accuracy: 0.8300 - val_loss: 0.7313 - learning_rate: 5.0000e-04 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 31s 72ms/step - accuracy: 0.2698 - loss: 2.0492 - val_accuracy: 0.0909 - val_loss: 3.2624 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.4920 - loss: 1.4496 - val_accuracy: 0.3668 - val_loss: 2.0656 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.6382 - loss: 1.0860 - val_accuracy: 0.3450 - val_loss: 2.1705 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.7303 - loss: 0.8487 - val_accuracy: 0.7386 - val_loss: 0.8934 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.7915 - loss: 0.6768 - val_accuracy: 0.7350 - val_loss: 0.8019 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 57ms/step - accuracy: 0.8264 - loss: 0.5428 - val_accuracy: 0.7468 - val_loss: 0.8691 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8560 - loss: 0.4682 - val_accuracy: 0.6136 - val_loss: 1.1981 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.8768 - loss: 0.4209 - val_accuracy: 0.8618 - val_loss: 0.4029 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.8954 - loss: 0.3431 - val_accuracy: 0.8009 - val_loss: 0.6974 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9127 - loss: 0.2976 - val_accuracy: 0.6545 - val_loss: 1.2640 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.9128 - loss: 0.2907 Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9128 - loss: 0.2908 - val_accuracy: 0.6964 - val_loss: 1.0517 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9474 - loss: 0.1988 - val_accuracy: 0.9109 - val_loss: 0.3740 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.9515 - loss: 0.1708 - val_accuracy: 0.8995 - val_loss: 0.3655 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9570 - loss: 0.1620 - val_accuracy: 0.9291 - val_loss: 0.2467 - learning_rate: 5.0000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9527 - loss: 0.1556 - val_accuracy: 0.9464 - val_loss: 0.1847 - learning_rate: 5.0000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9601 - loss: 0.1414 - val_accuracy: 0.9327 - val_loss: 0.2400 - learning_rate: 5.0000e-04 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9554 - loss: 0.1522 - val_accuracy: 0.9600 - val_loss: 0.1463 - learning_rate: 5.0000e-04 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 58ms/step - accuracy: 0.9640 - loss: 0.1262 - val_accuracy: 0.8118 - val_loss: 0.8915 - learning_rate: 5.0000e-04 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 59ms/step - accuracy: 0.9655 - loss: 0.1256 - val_accuracy: 0.9727 - val_loss: 0.1055 - learning_rate: 5.0000e-04 Epoch 20/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 19s 58ms/step - accuracy: 0.9676 - loss: 0.1161 - val_accuracy: 0.9045 - val_loss: 0.3924 - learning_rate: 5.0000e-04
4. Mobilenet-Lite-inspired model¶
- Mobilenet-Lite-inspired model
Tailored for efficiency: uses SeparableConv2D (depthwise separable convolutions) to drastically reduce parameter count and computation.
Filter sizes progress from 32, 64, 128 across blocks.
Each block includes: SeparableConv2D, BatchNormalization, MaxPooling or GlobalAveragePooling.
Starts with Rescaling layer to normalize input pixels to [0, 1].
Ends with a compact Dense(64) + Dropout(0.3) to softmax.
def mobilenet_lite(input_shape=(23, 23, 1), num_classes=11):
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=input_shape),
tf.keras.layers.Rescaling(1./255),
tf.keras.layers.SeparableConv2D(32, (3, 3), padding='same', activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(pool_size=(1, 2)),
tf.keras.layers.SeparableConv2D(64, (3, 3), padding='same', activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.SeparableConv2D(128, (3, 3), padding='same', activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
return model
# ------------------------------Small------------------------------
small_mobilenet_model = mobilenet_lite()
small_mobilenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
small_mobilenet_model.summary()
# ------------------------------Large------------------------------
large_mobilenet_model = mobilenet_lite(input_shape=(101, 101, 1))
large_mobilenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
large_mobilenet_model.summary()
Model: "sequential_38"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ rescaling (Rescaling) │ (None, 23, 23, 1) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d │ (None, 23, 23, 32) │ 73 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_68 │ (None, 23, 23, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_24 (MaxPooling2D) │ (None, 23, 11, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_1 │ (None, 23, 11, 64) │ 2,400 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_69 │ (None, 23, 11, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_25 (MaxPooling2D) │ (None, 11, 5, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_2 │ (None, 11, 5, 128) │ 8,896 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_70 │ (None, 11, 5, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_18 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_59 (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_42 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_60 (Dense) │ (None, 11) │ 715 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 21,236 (82.95 KB)
Trainable params: 20,788 (81.20 KB)
Non-trainable params: 448 (1.75 KB)
Model: "sequential_39"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ rescaling_1 (Rescaling) │ (None, 101, 101, 1) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_3 │ (None, 101, 101, 32) │ 73 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_71 │ (None, 101, 101, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_26 (MaxPooling2D) │ (None, 101, 50, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_4 │ (None, 101, 50, 64) │ 2,400 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_72 │ (None, 101, 50, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_27 (MaxPooling2D) │ (None, 50, 25, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_5 │ (None, 50, 25, 128) │ 8,896 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_73 │ (None, 50, 25, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_19 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_61 (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_43 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_62 (Dense) │ (None, 11) │ 715 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 21,236 (82.95 KB)
Trainable params: 20,788 (81.20 KB)
Non-trainable params: 448 (1.75 KB)
# ------------------------------Small------------------------------
small_mobilenet_history = small_mobilenet_model.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['MobileNet'] = small_mobilenet_history.history
# ------------------------------Large------------------------------
large_mobilenet_history = large_mobilenet_model.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['MobileNet'] = large_mobilenet_history.history
Epoch 1/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 12s 16ms/step - accuracy: 0.2149 - loss: 2.2140 - val_accuracy: 0.0909 - val_loss: 2.4556 - learning_rate: 0.0010 Epoch 2/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3719 - loss: 1.7944 - val_accuracy: 0.1182 - val_loss: 2.6646 - learning_rate: 0.0010 Epoch 3/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4593 - loss: 1.5753 - val_accuracy: 0.2095 - val_loss: 2.9365 - learning_rate: 0.0010 Epoch 4/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5228 - loss: 1.3817 - val_accuracy: 0.2432 - val_loss: 2.3082 - learning_rate: 0.0010 Epoch 5/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5737 - loss: 1.2547 - val_accuracy: 0.3391 - val_loss: 2.3796 - learning_rate: 0.0010 Epoch 6/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5959 - loss: 1.1978 - val_accuracy: 0.1314 - val_loss: 7.8764 - learning_rate: 0.0010 Epoch 7/30 319/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6278 - loss: 1.1042 Epoch 7: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6279 - loss: 1.1041 - val_accuracy: 0.3191 - val_loss: 2.7673 - learning_rate: 0.0010 Epoch 8/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6644 - loss: 0.9989 - val_accuracy: 0.4714 - val_loss: 1.6653 - learning_rate: 5.0000e-04 Epoch 9/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6630 - loss: 0.9911 - val_accuracy: 0.2505 - val_loss: 2.7174 - learning_rate: 5.0000e-04 Epoch 10/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6798 - loss: 0.9486 - val_accuracy: 0.2200 - val_loss: 6.9227 - learning_rate: 5.0000e-04 Epoch 11/30 315/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.6982 - loss: 0.9058 Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6981 - loss: 0.9059 - val_accuracy: 0.0909 - val_loss: 15.9827 - learning_rate: 5.0000e-04 Epoch 12/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7132 - loss: 0.8591 - val_accuracy: 0.2618 - val_loss: 3.5090 - learning_rate: 2.5000e-04 Epoch 13/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7146 - loss: 0.8566 - val_accuracy: 0.4350 - val_loss: 1.8526 - learning_rate: 2.5000e-04 Epoch 14/30 316/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7133 - loss: 0.8606 Epoch 14: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7134 - loss: 0.8598 - val_accuracy: 0.4341 - val_loss: 1.8461 - learning_rate: 2.5000e-04 Epoch 15/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7281 - loss: 0.8202 - val_accuracy: 0.5832 - val_loss: 1.2580 - learning_rate: 1.2500e-04 Epoch 16/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7290 - loss: 0.8125 - val_accuracy: 0.6923 - val_loss: 0.9420 - learning_rate: 1.2500e-04 Epoch 17/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7365 - loss: 0.7990 - val_accuracy: 0.3668 - val_loss: 3.1542 - learning_rate: 1.2500e-04 Epoch 18/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7368 - loss: 0.7989 - val_accuracy: 0.5677 - val_loss: 1.2543 - learning_rate: 1.2500e-04 Epoch 19/30 318/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.7329 - loss: 0.7918 Epoch 19: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7329 - loss: 0.7916 - val_accuracy: 0.5286 - val_loss: 1.4801 - learning_rate: 1.2500e-04 Epoch 1/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 30ms/step - accuracy: 0.2471 - loss: 2.1513 - val_accuracy: 0.0909 - val_loss: 2.4715 - learning_rate: 0.0010 Epoch 2/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 12s 14ms/step - accuracy: 0.4029 - loss: 1.7278 - val_accuracy: 0.0914 - val_loss: 6.5415 - learning_rate: 0.0010 Epoch 3/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.5058 - loss: 1.4302 - val_accuracy: 0.1455 - val_loss: 8.3258 - learning_rate: 0.0010 Epoch 4/30 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.5647 - loss: 1.2593 Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.5648 - loss: 1.2590 - val_accuracy: 0.0909 - val_loss: 90.9654 - learning_rate: 0.0010 Epoch 5/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6206 - loss: 1.1180 - val_accuracy: 0.2245 - val_loss: 4.8245 - learning_rate: 5.0000e-04 Epoch 6/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6430 - loss: 1.0522 - val_accuracy: 0.2264 - val_loss: 8.3670 - learning_rate: 5.0000e-04 Epoch 7/30 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6510 - loss: 1.0162 Epoch 7: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6511 - loss: 1.0160 - val_accuracy: 0.1273 - val_loss: 22.6868 - learning_rate: 5.0000e-04 Epoch 8/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6840 - loss: 0.9573 - val_accuracy: 0.4150 - val_loss: 1.7806 - learning_rate: 2.5000e-04 Epoch 9/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step - accuracy: 0.6873 - loss: 0.9237 - val_accuracy: 0.1227 - val_loss: 7.0741 - learning_rate: 2.5000e-04 Epoch 10/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6919 - loss: 0.9039 - val_accuracy: 0.2059 - val_loss: 7.9217 - learning_rate: 2.5000e-04 Epoch 11/30 327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7032 - loss: 0.8893 Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.7033 - loss: 0.8892 - val_accuracy: 0.2609 - val_loss: 3.9283 - learning_rate: 2.5000e-04 Epoch 12/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 6s 17ms/step - accuracy: 0.7097 - loss: 0.8478 - val_accuracy: 0.3895 - val_loss: 3.2631 - learning_rate: 1.2500e-04 Epoch 13/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 9s 14ms/step - accuracy: 0.7233 - loss: 0.8342 - val_accuracy: 0.4359 - val_loss: 1.6465 - learning_rate: 1.2500e-04 Epoch 14/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 16ms/step - accuracy: 0.7294 - loss: 0.8297 - val_accuracy: 0.6632 - val_loss: 0.9992 - learning_rate: 1.2500e-04 Epoch 15/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 10s 14ms/step - accuracy: 0.7308 - loss: 0.8294 - val_accuracy: 0.4536 - val_loss: 1.7776 - learning_rate: 1.2500e-04 Epoch 16/30 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7238 - loss: 0.8322 - val_accuracy: 0.5450 - val_loss: 1.2672 - learning_rate: 1.2500e-04 Epoch 17/30 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7232 - loss: 0.8129 Epoch 17: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05. 328/328 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step - accuracy: 0.7232 - loss: 0.8128 - val_accuracy: 0.3795 - val_loss: 2.5536 - learning_rate: 1.2500e-04
# ------------------------------Small------------------------------
aug_small_mobilenet_model = mobilenet_lite()
aug_small_mobilenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_small_mobilenet_model.summary()
# ------------------------------Large------------------------------
aug_large_mobilenet_model = mobilenet_lite(input_shape=(101, 101, 1))
aug_large_mobilenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_large_mobilenet_model.summary()
Model: "sequential_40"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ rescaling_2 (Rescaling) │ (None, 23, 23, 1) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_6 │ (None, 23, 23, 32) │ 73 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_74 │ (None, 23, 23, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_28 (MaxPooling2D) │ (None, 23, 11, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_7 │ (None, 23, 11, 64) │ 2,400 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_75 │ (None, 23, 11, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_29 (MaxPooling2D) │ (None, 11, 5, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_8 │ (None, 11, 5, 128) │ 8,896 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_76 │ (None, 11, 5, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_20 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_63 (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_44 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_64 (Dense) │ (None, 11) │ 715 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 21,236 (82.95 KB)
Trainable params: 20,788 (81.20 KB)
Non-trainable params: 448 (1.75 KB)
Model: "sequential_41"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ rescaling_3 (Rescaling) │ (None, 101, 101, 1) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_9 │ (None, 101, 101, 32) │ 73 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_77 │ (None, 101, 101, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_30 (MaxPooling2D) │ (None, 101, 50, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_10 │ (None, 101, 50, 64) │ 2,400 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_78 │ (None, 101, 50, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_31 (MaxPooling2D) │ (None, 50, 25, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ separable_conv2d_11 │ (None, 50, 25, 128) │ 8,896 │ │ (SeparableConv2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_79 │ (None, 50, 25, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_21 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_65 (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_45 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_66 (Dense) │ (None, 11) │ 715 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 21,236 (82.95 KB)
Trainable params: 20,788 (81.20 KB)
Non-trainable params: 448 (1.75 KB)
# ------------------------------Small------------------------------
aug_small_mobilenet_history = aug_small_mobilenet_model.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['MobileNet with Augmented Data'] = aug_small_mobilenet_history.history
# ------------------------------Large------------------------------
aug_large_mobilenet_history = aug_large_mobilenet_model.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['MobileNet with Augmented Data'] = aug_large_mobilenet_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - accuracy: 0.2225 - loss: 2.1902 - val_accuracy: 0.0909 - val_loss: 2.4308 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.3473 - loss: 1.8656 - val_accuracy: 0.1150 - val_loss: 4.0065 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4346 - loss: 1.6383 - val_accuracy: 0.2936 - val_loss: 2.3334 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4884 - loss: 1.4905 - val_accuracy: 0.2114 - val_loss: 4.8738 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5347 - loss: 1.3611 - val_accuracy: 0.1623 - val_loss: 4.5721 - learning_rate: 0.0010 Epoch 6/20 321/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.5648 - loss: 1.2729 Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.5650 - loss: 1.2726 - val_accuracy: 0.0968 - val_loss: 17.3579 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.5962 - loss: 1.1879 - val_accuracy: 0.0905 - val_loss: 22.6661 - learning_rate: 5.0000e-04 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6145 - loss: 1.1455 - val_accuracy: 0.1295 - val_loss: 13.6056 - learning_rate: 5.0000e-04 Epoch 9/20 313/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6216 - loss: 1.1104 Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6215 - loss: 1.1103 - val_accuracy: 0.2245 - val_loss: 5.8888 - learning_rate: 5.0000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6353 - loss: 1.0613 - val_accuracy: 0.1832 - val_loss: 4.1779 - learning_rate: 2.5000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6488 - loss: 1.0499 - val_accuracy: 0.2500 - val_loss: 3.8112 - learning_rate: 2.5000e-04 Epoch 12/20 316/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6490 - loss: 1.0331 Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6490 - loss: 1.0332 - val_accuracy: 0.1382 - val_loss: 9.2115 - learning_rate: 2.5000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6432 - loss: 1.0368 - val_accuracy: 0.6191 - val_loss: 1.1213 - learning_rate: 1.2500e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.6657 - loss: 0.9905 - val_accuracy: 0.4255 - val_loss: 1.8480 - learning_rate: 1.2500e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6654 - loss: 0.9999 - val_accuracy: 0.0909 - val_loss: 27.1148 - learning_rate: 1.2500e-04 Epoch 16/20 322/328 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.6733 - loss: 0.9744 Epoch 16: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05. 328/328 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6731 - loss: 0.9747 - val_accuracy: 0.1450 - val_loss: 42.2199 - learning_rate: 1.2500e-04 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 28ms/step - accuracy: 0.2256 - loss: 2.1789 - val_accuracy: 0.0909 - val_loss: 2.4568 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.3973 - loss: 1.7607 - val_accuracy: 0.1045 - val_loss: 5.2101 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.4876 - loss: 1.4821 - val_accuracy: 0.0909 - val_loss: 13.3821 - learning_rate: 0.0010 Epoch 4/20 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.5610 - loss: 1.2910 Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.5611 - loss: 1.2905 - val_accuracy: 0.2300 - val_loss: 3.4362 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6232 - loss: 1.1182 - val_accuracy: 0.3105 - val_loss: 2.3407 - learning_rate: 5.0000e-04 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6376 - loss: 1.0560 - val_accuracy: 0.5550 - val_loss: 1.1949 - learning_rate: 5.0000e-04 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.6590 - loss: 0.9879 - val_accuracy: 0.3632 - val_loss: 1.9190 - learning_rate: 5.0000e-04 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.6763 - loss: 0.9465 - val_accuracy: 0.3223 - val_loss: 2.8640 - learning_rate: 5.0000e-04 Epoch 9/20 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7000 - loss: 0.8820 Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7000 - loss: 0.8820 - val_accuracy: 0.3686 - val_loss: 2.1724 - learning_rate: 5.0000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7074 - loss: 0.8467 - val_accuracy: 0.4118 - val_loss: 3.7875 - learning_rate: 2.5000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7287 - loss: 0.8173 - val_accuracy: 0.5345 - val_loss: 1.4572 - learning_rate: 2.5000e-04 Epoch 12/20 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7374 - loss: 0.7773 Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7374 - loss: 0.7773 - val_accuracy: 0.5586 - val_loss: 1.3115 - learning_rate: 2.5000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7400 - loss: 0.7568 - val_accuracy: 0.7359 - val_loss: 0.7604 - learning_rate: 1.2500e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7515 - loss: 0.7557 - val_accuracy: 0.3973 - val_loss: 2.3049 - learning_rate: 1.2500e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7557 - loss: 0.7239 - val_accuracy: 0.7114 - val_loss: 0.8436 - learning_rate: 1.2500e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7640 - loss: 0.7311 - val_accuracy: 0.7441 - val_loss: 0.7242 - learning_rate: 1.2500e-04 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7627 - loss: 0.7163 - val_accuracy: 0.4955 - val_loss: 1.7038 - learning_rate: 1.2500e-04 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 14ms/step - accuracy: 0.7656 - loss: 0.7222 - val_accuracy: 0.4186 - val_loss: 2.2651 - learning_rate: 1.2500e-04 Epoch 19/20 325/328 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7631 - loss: 0.7192 Epoch 19: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05. 328/328 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.7631 - loss: 0.7191 - val_accuracy: 0.3082 - val_loss: 3.9644 - learning_rate: 1.2500e-04
5. Mini-Densenet-inspired model¶
- Mini-Densenet-inspired model
Begins with a Conv2D layer (16 filters) followed by two Dense Blocks.
Each Dense Block:
Contains 3 convolutional layers with growth rate = 12. Uses BatchNormalization to ReLU to Conv2D. Applies feature concatenation from all previous layers (key DenseNet trait).
Includes MaxPooling2D (1x2) after first block and GlobalAveragePooling2D at the end.
Final classifier: Dense(64) + Dropout(0.3) to softmax.
def densenet_block(x, growth_rate, layers):
for _ in range(layers):
out = tf.keras.layers.BatchNormalization()(x)
out = tf.keras.layers.ReLU()(out)
out = tf.keras.layers.Conv2D(growth_rate, (3, 3), padding='same')(out)
x = tf.keras.layers.Concatenate()([x, out])
return x
def mini_densenet(input_shape=(23, 23, 1), num_classes=11):
inputs = tf.keras.Input(shape=input_shape)
x = tf.keras.layers.Conv2D(16, (3, 3), padding='same')(inputs)
x = densenet_block(x, growth_rate=12, layers=3)
x = tf.keras.layers.MaxPooling2D(pool_size=(1, 2))(x)
x = densenet_block(x, growth_rate=12, layers=3)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
x = tf.keras.layers.Dropout(0.3)(x)
outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
return tf.keras.Model(inputs, outputs)
# ------------------------------Small------------------------------
small_densenet_model = mini_densenet()
small_densenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
small_densenet_model.summary()
# ------------------------------Large------------------------------
large_densenet_model = mini_densenet(input_shape=(101, 101, 1))
large_densenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
large_densenet_model.summary()
Model: "functional_46"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_46 │ (None, 23, 23, 1) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_108 (Conv2D) │ (None, 23, 23, │ 160 │ input_layer_46[0… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 64 │ conv2d_108[0][0] │ │ (BatchNormalizatio… │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu (ReLU) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_109 (Conv2D) │ (None, 23, 23, │ 1,740 │ re_lu[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate │ (None, 23, 23, │ 0 │ conv2d_108[0][0], │ │ (Concatenate) │ 28) │ │ conv2d_109[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 112 │ concatenate[0][0] │ │ (BatchNormalizatio… │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_1 (ReLU) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_110 (Conv2D) │ (None, 23, 23, │ 3,036 │ re_lu_1[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_1 │ (None, 23, 23, │ 0 │ concatenate[0][0… │ │ (Concatenate) │ 40) │ │ conv2d_110[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 160 │ concatenate_1[0]… │ │ (BatchNormalizatio… │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_2 (ReLU) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_111 (Conv2D) │ (None, 23, 23, │ 4,332 │ re_lu_2[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_2 │ (None, 23, 23, │ 0 │ concatenate_1[0]… │ │ (Concatenate) │ 52) │ │ conv2d_111[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_32 │ (None, 23, 11, │ 0 │ concatenate_2[0]… │ │ (MaxPooling2D) │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 208 │ max_pooling2d_32… │ │ (BatchNormalizatio… │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_3 (ReLU) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_112 (Conv2D) │ (None, 23, 11, │ 5,628 │ re_lu_3[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_3 │ (None, 23, 11, │ 0 │ max_pooling2d_32… │ │ (Concatenate) │ 64) │ │ conv2d_112[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 256 │ concatenate_3[0]… │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_4 (ReLU) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_113 (Conv2D) │ (None, 23, 11, │ 6,924 │ re_lu_4[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_4 │ (None, 23, 11, │ 0 │ concatenate_3[0]… │ │ (Concatenate) │ 76) │ │ conv2d_113[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 304 │ concatenate_4[0]… │ │ (BatchNormalizatio… │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_5 (ReLU) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_114 (Conv2D) │ (None, 23, 11, │ 8,220 │ re_lu_5[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_5 │ (None, 23, 11, │ 0 │ concatenate_4[0]… │ │ (Concatenate) │ 88) │ │ conv2d_114[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 88) │ 0 │ concatenate_5[0]… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_67 (Dense) │ (None, 64) │ 5,696 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_46 │ (None, 64) │ 0 │ dense_67[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_68 (Dense) │ (None, 11) │ 715 │ dropout_46[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 37,555 (146.70 KB)
Trainable params: 37,003 (144.54 KB)
Non-trainable params: 552 (2.16 KB)
Model: "functional_47"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_47 │ (None, 101, 101, │ 0 │ - │ │ (InputLayer) │ 1) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_115 (Conv2D) │ (None, 101, 101, │ 160 │ input_layer_47[0… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 64 │ conv2d_115[0][0] │ │ (BatchNormalizatio… │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_6 (ReLU) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_116 (Conv2D) │ (None, 101, 101, │ 1,740 │ re_lu_6[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_6 │ (None, 101, 101, │ 0 │ conv2d_115[0][0], │ │ (Concatenate) │ 28) │ │ conv2d_116[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 112 │ concatenate_6[0]… │ │ (BatchNormalizatio… │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_7 (ReLU) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_117 (Conv2D) │ (None, 101, 101, │ 3,036 │ re_lu_7[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_7 │ (None, 101, 101, │ 0 │ concatenate_6[0]… │ │ (Concatenate) │ 40) │ │ conv2d_117[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 160 │ concatenate_7[0]… │ │ (BatchNormalizatio… │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_8 (ReLU) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_118 (Conv2D) │ (None, 101, 101, │ 4,332 │ re_lu_8[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_8 │ (None, 101, 101, │ 0 │ concatenate_7[0]… │ │ (Concatenate) │ 52) │ │ conv2d_118[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_33 │ (None, 101, 50, │ 0 │ concatenate_8[0]… │ │ (MaxPooling2D) │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 208 │ max_pooling2d_33… │ │ (BatchNormalizatio… │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_9 (ReLU) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_119 (Conv2D) │ (None, 101, 50, │ 5,628 │ re_lu_9[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_9 │ (None, 101, 50, │ 0 │ max_pooling2d_33… │ │ (Concatenate) │ 64) │ │ conv2d_119[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 256 │ concatenate_9[0]… │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_10 (ReLU) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_120 (Conv2D) │ (None, 101, 50, │ 6,924 │ re_lu_10[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_10 │ (None, 101, 50, │ 0 │ concatenate_9[0]… │ │ (Concatenate) │ 76) │ │ conv2d_120[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 304 │ concatenate_10[0… │ │ (BatchNormalizatio… │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_11 (ReLU) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_121 (Conv2D) │ (None, 101, 50, │ 8,220 │ re_lu_11[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_11 │ (None, 101, 50, │ 0 │ concatenate_10[0… │ │ (Concatenate) │ 88) │ │ conv2d_121[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 88) │ 0 │ concatenate_11[0… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_69 (Dense) │ (None, 64) │ 5,696 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_47 │ (None, 64) │ 0 │ dense_69[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_70 (Dense) │ (None, 11) │ 715 │ dropout_47[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 37,555 (146.70 KB)
Trainable params: 37,003 (144.54 KB)
Non-trainable params: 552 (2.16 KB)
# ------------------------------Small------------------------------
small_densenet_history = small_densenet_model.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['DenseNet'] = small_densenet_history.history
# ------------------------------Large------------------------------
large_densenet_history = large_densenet_model.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['DenseNet'] = large_densenet_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 18s 29ms/step - accuracy: 0.1992 - loss: 2.2404 - val_accuracy: 0.0909 - val_loss: 5.9046 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.3300 - loss: 1.8944 - val_accuracy: 0.2041 - val_loss: 3.4966 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4359 - loss: 1.6202 - val_accuracy: 0.4418 - val_loss: 1.6227 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.4902 - loss: 1.4498 - val_accuracy: 0.2582 - val_loss: 2.7828 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5421 - loss: 1.3194 - val_accuracy: 0.5755 - val_loss: 1.2331 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5631 - loss: 1.2467 - val_accuracy: 0.3141 - val_loss: 2.3984 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5881 - loss: 1.1691 - val_accuracy: 0.2750 - val_loss: 6.2530 - learning_rate: 0.0010 Epoch 8/20 319/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6199 - loss: 1.1219 Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6203 - loss: 1.1206 - val_accuracy: 0.3900 - val_loss: 2.1949 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6743 - loss: 0.9559 - val_accuracy: 0.5814 - val_loss: 1.2654 - learning_rate: 5.0000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6864 - loss: 0.9254 - val_accuracy: 0.5464 - val_loss: 1.2873 - learning_rate: 5.0000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6914 - loss: 0.8922 - val_accuracy: 0.6073 - val_loss: 1.1625 - learning_rate: 5.0000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7005 - loss: 0.8620 - val_accuracy: 0.7427 - val_loss: 0.7916 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7095 - loss: 0.8525 - val_accuracy: 0.6555 - val_loss: 1.0677 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7252 - loss: 0.8034 - val_accuracy: 0.7382 - val_loss: 0.7487 - learning_rate: 5.0000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7186 - loss: 0.8082 - val_accuracy: 0.6409 - val_loss: 1.1873 - learning_rate: 5.0000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7427 - loss: 0.7606 - val_accuracy: 0.6200 - val_loss: 1.1299 - learning_rate: 5.0000e-04 Epoch 17/20 323/328 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7413 - loss: 0.7479 Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7413 - loss: 0.7479 - val_accuracy: 0.6768 - val_loss: 0.9252 - learning_rate: 5.0000e-04 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 39s 82ms/step - accuracy: 0.2171 - loss: 2.1935 - val_accuracy: 0.0909 - val_loss: 8.1020 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.3760 - loss: 1.7215 - val_accuracy: 0.3018 - val_loss: 2.0443 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.4666 - loss: 1.4835 - val_accuracy: 0.1886 - val_loss: 6.5382 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.4930 - loss: 1.4046 - val_accuracy: 0.4291 - val_loss: 1.7117 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.5306 - loss: 1.2862 - val_accuracy: 0.4023 - val_loss: 1.8823 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 46ms/step - accuracy: 0.5865 - loss: 1.1891 - val_accuracy: 0.4055 - val_loss: 1.6295 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.6067 - loss: 1.0852 - val_accuracy: 0.5636 - val_loss: 1.2996 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.6254 - loss: 1.0600 - val_accuracy: 0.5659 - val_loss: 1.2756 - learning_rate: 0.0010 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.6571 - loss: 0.9766 - val_accuracy: 0.6955 - val_loss: 0.8901 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 45ms/step - accuracy: 0.6663 - loss: 0.9188 - val_accuracy: 0.4073 - val_loss: 2.1295 - learning_rate: 0.0010 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 46ms/step - accuracy: 0.6958 - loss: 0.8733 - val_accuracy: 0.4500 - val_loss: 2.7203 - learning_rate: 0.0010 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7168 - loss: 0.8161 Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 47ms/step - accuracy: 0.7168 - loss: 0.8161 - val_accuracy: 0.6377 - val_loss: 1.0256 - learning_rate: 0.0010 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.7770 - loss: 0.6596 - val_accuracy: 0.7573 - val_loss: 0.7120 - learning_rate: 5.0000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.7799 - loss: 0.6523 - val_accuracy: 0.7664 - val_loss: 0.6801 - learning_rate: 5.0000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 47ms/step - accuracy: 0.7875 - loss: 0.6309 - val_accuracy: 0.6327 - val_loss: 1.3623 - learning_rate: 5.0000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.7987 - loss: 0.5925 - val_accuracy: 0.6864 - val_loss: 0.9562 - learning_rate: 5.0000e-04 Epoch 17/20 327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8130 - loss: 0.5640 Epoch 17: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.8130 - loss: 0.5640 - val_accuracy: 0.6768 - val_loss: 1.1052 - learning_rate: 5.0000e-04
# ------------------------------Small------------------------------
aug_small_densenet_model = mini_densenet()
aug_small_densenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_small_densenet_model.summary()
# ------------------------------Large------------------------------
aug_large_densenet_model = mini_densenet(input_shape=(101, 101, 1))
aug_large_densenet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
aug_large_densenet_model.summary()
Model: "functional_48"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_48 │ (None, 23, 23, 1) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_122 (Conv2D) │ (None, 23, 23, │ 160 │ input_layer_48[0… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 64 │ conv2d_122[0][0] │ │ (BatchNormalizatio… │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_12 (ReLU) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_123 (Conv2D) │ (None, 23, 23, │ 1,740 │ re_lu_12[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_12 │ (None, 23, 23, │ 0 │ conv2d_122[0][0], │ │ (Concatenate) │ 28) │ │ conv2d_123[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 112 │ concatenate_12[0… │ │ (BatchNormalizatio… │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_13 (ReLU) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_124 (Conv2D) │ (None, 23, 23, │ 3,036 │ re_lu_13[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_13 │ (None, 23, 23, │ 0 │ concatenate_12[0… │ │ (Concatenate) │ 40) │ │ conv2d_124[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 23, │ 160 │ concatenate_13[0… │ │ (BatchNormalizatio… │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_14 (ReLU) │ (None, 23, 23, │ 0 │ batch_normalizat… │ │ │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_125 (Conv2D) │ (None, 23, 23, │ 4,332 │ re_lu_14[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_14 │ (None, 23, 23, │ 0 │ concatenate_13[0… │ │ (Concatenate) │ 52) │ │ conv2d_125[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_34 │ (None, 23, 11, │ 0 │ concatenate_14[0… │ │ (MaxPooling2D) │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 208 │ max_pooling2d_34… │ │ (BatchNormalizatio… │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_15 (ReLU) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_126 (Conv2D) │ (None, 23, 11, │ 5,628 │ re_lu_15[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_15 │ (None, 23, 11, │ 0 │ max_pooling2d_34… │ │ (Concatenate) │ 64) │ │ conv2d_126[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 256 │ concatenate_15[0… │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_16 (ReLU) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_127 (Conv2D) │ (None, 23, 11, │ 6,924 │ re_lu_16[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_16 │ (None, 23, 11, │ 0 │ concatenate_15[0… │ │ (Concatenate) │ 76) │ │ conv2d_127[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 23, 11, │ 304 │ concatenate_16[0… │ │ (BatchNormalizatio… │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_17 (ReLU) │ (None, 23, 11, │ 0 │ batch_normalizat… │ │ │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_128 (Conv2D) │ (None, 23, 11, │ 8,220 │ re_lu_17[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_17 │ (None, 23, 11, │ 0 │ concatenate_16[0… │ │ (Concatenate) │ 88) │ │ conv2d_128[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 88) │ 0 │ concatenate_17[0… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_71 (Dense) │ (None, 64) │ 5,696 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_48 │ (None, 64) │ 0 │ dense_71[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_72 (Dense) │ (None, 11) │ 715 │ dropout_48[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 37,555 (146.70 KB)
Trainable params: 37,003 (144.54 KB)
Non-trainable params: 552 (2.16 KB)
Model: "functional_49"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_49 │ (None, 101, 101, │ 0 │ - │ │ (InputLayer) │ 1) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_129 (Conv2D) │ (None, 101, 101, │ 160 │ input_layer_49[0… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 64 │ conv2d_129[0][0] │ │ (BatchNormalizatio… │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_18 (ReLU) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 16) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_130 (Conv2D) │ (None, 101, 101, │ 1,740 │ re_lu_18[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_18 │ (None, 101, 101, │ 0 │ conv2d_129[0][0], │ │ (Concatenate) │ 28) │ │ conv2d_130[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 112 │ concatenate_18[0… │ │ (BatchNormalizatio… │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_19 (ReLU) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 28) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_131 (Conv2D) │ (None, 101, 101, │ 3,036 │ re_lu_19[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_19 │ (None, 101, 101, │ 0 │ concatenate_18[0… │ │ (Concatenate) │ 40) │ │ conv2d_131[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 101, │ 160 │ concatenate_19[0… │ │ (BatchNormalizatio… │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_20 (ReLU) │ (None, 101, 101, │ 0 │ batch_normalizat… │ │ │ 40) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_132 (Conv2D) │ (None, 101, 101, │ 4,332 │ re_lu_20[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_20 │ (None, 101, 101, │ 0 │ concatenate_19[0… │ │ (Concatenate) │ 52) │ │ conv2d_132[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_35 │ (None, 101, 50, │ 0 │ concatenate_20[0… │ │ (MaxPooling2D) │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 208 │ max_pooling2d_35… │ │ (BatchNormalizatio… │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_21 (ReLU) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 52) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_133 (Conv2D) │ (None, 101, 50, │ 5,628 │ re_lu_21[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_21 │ (None, 101, 50, │ 0 │ max_pooling2d_35… │ │ (Concatenate) │ 64) │ │ conv2d_133[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 256 │ concatenate_21[0… │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_22 (ReLU) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_134 (Conv2D) │ (None, 101, 50, │ 6,924 │ re_lu_22[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_22 │ (None, 101, 50, │ 0 │ concatenate_21[0… │ │ (Concatenate) │ 76) │ │ conv2d_134[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 101, 50, │ 304 │ concatenate_22[0… │ │ (BatchNormalizatio… │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ re_lu_23 (ReLU) │ (None, 101, 50, │ 0 │ batch_normalizat… │ │ │ 76) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_135 (Conv2D) │ (None, 101, 50, │ 8,220 │ re_lu_23[0][0] │ │ │ 12) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_23 │ (None, 101, 50, │ 0 │ concatenate_22[0… │ │ (Concatenate) │ 88) │ │ conv2d_135[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ global_average_poo… │ (None, 88) │ 0 │ concatenate_23[0… │ │ (GlobalAveragePool… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_73 (Dense) │ (None, 64) │ 5,696 │ global_average_p… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_49 │ (None, 64) │ 0 │ dense_73[0][0] │ │ (Dropout) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_74 (Dense) │ (None, 11) │ 715 │ dropout_49[0][0] │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 37,555 (146.70 KB)
Trainable params: 37,003 (144.54 KB)
Non-trainable params: 552 (2.16 KB)
# ------------------------------Small------------------------------
aug_small_densenet_history = aug_small_densenet_model.fit(
small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
small_history_dict['DenseNet with Augmented Data'] = aug_small_densenet_history.history
# ------------------------------Large------------------------------
aug_large_densenet_history = aug_large_densenet_model.fit(
large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr]
)
large_history_dict['DenseNet with Augmented Data'] = aug_large_densenet_history.history
Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 17s 26ms/step - accuracy: 0.1970 - loss: 2.2269 - val_accuracy: 0.0909 - val_loss: 4.2205 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 10s 5ms/step - accuracy: 0.3451 - loss: 1.8844 - val_accuracy: 0.1768 - val_loss: 3.6344 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.4385 - loss: 1.6101 - val_accuracy: 0.3036 - val_loss: 2.2559 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.4908 - loss: 1.4303 - val_accuracy: 0.4486 - val_loss: 2.0633 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5318 - loss: 1.3078 - val_accuracy: 0.4741 - val_loss: 1.7668 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5789 - loss: 1.2072 - val_accuracy: 0.6218 - val_loss: 1.1075 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.5984 - loss: 1.1355 - val_accuracy: 0.3705 - val_loss: 2.4384 - learning_rate: 0.0010 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6354 - loss: 1.0629 - val_accuracy: 0.4941 - val_loss: 1.5925 - learning_rate: 0.0010 Epoch 9/20 320/328 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6480 - loss: 1.0318 Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6482 - loss: 1.0314 - val_accuracy: 0.4523 - val_loss: 1.7354 - learning_rate: 0.0010 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6861 - loss: 0.9208 - val_accuracy: 0.5936 - val_loss: 1.2444 - learning_rate: 5.0000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7056 - loss: 0.8801 - val_accuracy: 0.5818 - val_loss: 1.3433 - learning_rate: 5.0000e-04 Epoch 12/20 320/328 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.7092 - loss: 0.8502 Epoch 12: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7092 - loss: 0.8503 - val_accuracy: 0.5859 - val_loss: 1.3310 - learning_rate: 5.0000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7322 - loss: 0.8058 - val_accuracy: 0.7173 - val_loss: 0.8433 - learning_rate: 2.5000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7342 - loss: 0.7723 - val_accuracy: 0.7332 - val_loss: 0.7947 - learning_rate: 2.5000e-04 Epoch 15/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7469 - loss: 0.7575 - val_accuracy: 0.7214 - val_loss: 0.8824 - learning_rate: 2.5000e-04 Epoch 16/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7447 - loss: 0.7501 - val_accuracy: 0.7191 - val_loss: 0.8465 - learning_rate: 2.5000e-04 Epoch 17/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7482 - loss: 0.7494 - val_accuracy: 0.7423 - val_loss: 0.7785 - learning_rate: 2.5000e-04 Epoch 18/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7597 - loss: 0.7250 - val_accuracy: 0.7541 - val_loss: 0.7254 - learning_rate: 2.5000e-04 Epoch 19/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7552 - loss: 0.7328 - val_accuracy: 0.7495 - val_loss: 0.7868 - learning_rate: 2.5000e-04 Epoch 20/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7567 - loss: 0.7185 - val_accuracy: 0.7741 - val_loss: 0.6897 - learning_rate: 2.5000e-04 Epoch 1/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 31s 65ms/step - accuracy: 0.2395 - loss: 2.1515 - val_accuracy: 0.0909 - val_loss: 4.5025 - learning_rate: 0.0010 Epoch 2/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 48ms/step - accuracy: 0.4017 - loss: 1.6754 - val_accuracy: 0.2427 - val_loss: 3.1961 - learning_rate: 0.0010 Epoch 3/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.4706 - loss: 1.4717 - val_accuracy: 0.3914 - val_loss: 1.7922 - learning_rate: 0.0010 Epoch 4/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.5174 - loss: 1.3745 - val_accuracy: 0.2936 - val_loss: 2.6981 - learning_rate: 0.0010 Epoch 5/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.5553 - loss: 1.2520 - val_accuracy: 0.3127 - val_loss: 2.9928 - learning_rate: 0.0010 Epoch 6/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.5891 - loss: 1.1725 Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 46ms/step - accuracy: 0.5891 - loss: 1.1725 - val_accuracy: 0.4595 - val_loss: 2.5955 - learning_rate: 0.0010 Epoch 7/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 16s 47ms/step - accuracy: 0.6392 - loss: 1.0507 - val_accuracy: 0.5768 - val_loss: 1.2177 - learning_rate: 5.0000e-04 Epoch 8/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.6510 - loss: 1.0037 - val_accuracy: 0.3355 - val_loss: 4.3405 - learning_rate: 5.0000e-04 Epoch 9/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.6674 - loss: 0.9675 - val_accuracy: 0.5509 - val_loss: 1.3227 - learning_rate: 5.0000e-04 Epoch 10/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.6828 - loss: 0.9172 Epoch 10: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 328/328 ━━━━━━━━━━━━━━━━━━━━ 21s 47ms/step - accuracy: 0.6828 - loss: 0.9172 - val_accuracy: 0.4532 - val_loss: 2.1191 - learning_rate: 5.0000e-04 Epoch 11/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.7049 - loss: 0.8696 - val_accuracy: 0.7400 - val_loss: 0.8087 - learning_rate: 2.5000e-04 Epoch 12/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.7361 - loss: 0.8097 - val_accuracy: 0.8055 - val_loss: 0.5951 - learning_rate: 2.5000e-04 Epoch 13/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 20s 45ms/step - accuracy: 0.7441 - loss: 0.7709 - val_accuracy: 0.5795 - val_loss: 1.2081 - learning_rate: 2.5000e-04 Epoch 14/20 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 47ms/step - accuracy: 0.7505 - loss: 0.7427 - val_accuracy: 0.4923 - val_loss: 1.8497 - learning_rate: 2.5000e-04 Epoch 15/20 327/328 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.7628 - loss: 0.7160 Epoch 15: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 328/328 ━━━━━━━━━━━━━━━━━━━━ 15s 46ms/step - accuracy: 0.7628 - loss: 0.7161 - val_accuracy: 0.6718 - val_loss: 0.9451 - learning_rate: 2.5000e-04
Model Evaluation¶
small_results_df = pd.DataFrame()
for name, history in small_history_dict.items():
final_train_acc = history['accuracy'][-1]
final_val_acc = history['val_accuracy'][-1]
new_row = pd.DataFrame({
'Model': [name],
'Final Train Accuracy': [final_train_acc],
'Final Val Accuracy': [final_val_acc],
})
small_results_df = pd.concat([small_results_df, new_row], ignore_index=True)
# Sort results by validation accuracy (optional)
small_results_df = small_results_df.sort_values(by='Final Val Accuracy', ascending=False).reset_index(drop=True)
# Function to highlight top 3 and bottom 3
def highlight_top_bottom(s):
sorted_idx = s.sort_values(ascending=False).index
styles = [''] * len(s)
# Top 3: shades of green
if len(s) >= 1: styles[sorted_idx[0]] = 'background-color: #a1d99b' # top
if len(s) >= 2: styles[sorted_idx[1]] = 'background-color: #c7e9c0'
if len(s) >= 3: styles[sorted_idx[2]] = 'background-color: #e5f5e0'
# Bottom 3: shades of red
if len(s) >= 1: styles[sorted_idx[-1]] = 'background-color: #fc9272' # worst
if len(s) >= 2: styles[sorted_idx[-2]] = 'background-color: #fcbba1'
if len(s) >= 3: styles[sorted_idx[-3]] = 'background-color: #fee0d2'
return styles
# Apply styling
styled_df = small_results_df.style.apply(highlight_top_bottom, subset=['Final Train Accuracy'])
styled_df = styled_df.apply(highlight_top_bottom, subset=['Final Val Accuracy'])
display(styled_df)
| Model | Final Train Accuracy | Final Val Accuracy | |
|---|---|---|---|
| 0 | Custom CNN | 0.921860 | 0.907727 |
| 1 | VGG | 0.915190 | 0.844091 |
| 2 | ResNet50 with Augmented Data | 0.956737 | 0.830000 |
| 3 | DenseNet with Augmented Data | 0.755289 | 0.774091 |
| 4 | ResNet50 | 0.935201 | 0.734545 |
| 5 | DenseNet | 0.741471 | 0.676818 |
| 6 | MobileNet | 0.735563 | 0.528636 |
| 7 | Custom CNN with Augmented Data | 0.786545 | 0.519545 |
| 8 | VGG with Augmented Data | 0.638841 | 0.513182 |
| 9 | Dummy Baseline | 0.377454 | 0.295455 |
| 10 | Dummy Baseline with Augmented Data | 0.251953 | 0.212273 |
| 11 | MobileNet with Augmented Data | 0.666857 | 0.145000 |
Top models for 23x23 Images¶
We can observe that the top 3 models (based on validation accuracy) are:
- Custom CNN
- VGG
- Resnet50 with Augmented Data
the worst are:
- Dummy Baseline with Augmented Data
- Dummy Baseline
- MobileNet with Augmented Data
We can observe that the Augmented models perfomed slightly worse than those without. This could be due to reasons such as overagressive transformations. Rotations, shears, zooms or random crops can obliterate the tiny features your network needs, especially at 23x23 where every pixel matters.
large_results_df = pd.DataFrame()
for name, history in large_history_dict.items():
final_train_acc = history['accuracy'][-1]
final_val_acc = history['val_accuracy'][-1]
new_row = pd.DataFrame({
'Model': [name],
'Final Train Accuracy': [final_train_acc],
'Final Val Accuracy': [final_val_acc],
})
large_results_df = pd.concat([large_results_df, new_row], ignore_index=True)
# Sort results by validation accuracy (optional)
large_results_df = large_results_df.sort_values(by='Final Val Accuracy', ascending=False).reset_index(drop=True)
# Function to highlight top 3 and bottom 3
def highlight_top_bottom(s):
sorted_idx = s.sort_values(ascending=False).index
styles = [''] * len(s)
# Top 3: shades of green
if len(s) >= 1: styles[sorted_idx[0]] = 'background-color: #a1d99b' # top
if len(s) >= 2: styles[sorted_idx[1]] = 'background-color: #c7e9c0'
if len(s) >= 3: styles[sorted_idx[2]] = 'background-color: #e5f5e0'
# Bottom 3: shades of red
if len(s) >= 1: styles[sorted_idx[-1]] = 'background-color: #fc9272' # worst
if len(s) >= 2: styles[sorted_idx[-2]] = 'background-color: #fcbba1'
if len(s) >= 3: styles[sorted_idx[-3]] = 'background-color: #fee0d2'
return styles
# Apply styling
styled_df = large_results_df.style.apply(highlight_top_bottom, subset=['Final Train Accuracy'])
styled_df = styled_df.apply(highlight_top_bottom, subset=['Final Val Accuracy'])
display(styled_df)
| Model | Final Train Accuracy | Final Val Accuracy | |
|---|---|---|---|
| 0 | Custom CNN | 0.983133 | 0.967273 |
| 1 | ResNet50 | 0.947208 | 0.931818 |
| 2 | VGG | 0.870593 | 0.922727 |
| 3 | ResNet50 with Augmented Data | 0.966743 | 0.904545 |
| 4 | Custom CNN with Augmented Data | 0.935582 | 0.734091 |
| 5 | VGG with Augmented Data | 0.700496 | 0.693636 |
| 6 | DenseNet | 0.810082 | 0.676818 |
| 7 | DenseNet with Augmented Data | 0.758052 | 0.671818 |
| 8 | MobileNet | 0.726415 | 0.379545 |
| 9 | Dummy Baseline | 0.554888 | 0.326364 |
| 10 | MobileNet with Augmented Data | 0.765771 | 0.308182 |
| 11 | Dummy Baseline with Augmented Data | 0.333810 | 0.171364 |
Top models for 101x101 Images¶
We can observe that the top 3 models (based on validation accuracy) are:
- Custom CNN
- Resnet 50
- VGG
the worst are:
- Dummy Baseline
- MobileNet with Augmented Data
- Dummy Baseline with Augmented Data
On 101x101, our Augmented models performed slightly better. This could be due to 101x101 having more pixels, and as a result the transformations were not too limited by the possible augmentations.
Hyperparameter Tuning¶
What metric will we use to hypertune?¶
Earlier we discussed the various metrics and their best use cases.
Given our earlier balancing of the dataset, and the non-safety-critical 11-way fruit/vegetable task, plain accuracy is perfectly reasonable as your single "how often am I right" measure—class-frequencies won't skew it. Hence, we will primarily use accuracy as our objective metric in determining the best tuned model.
23x23 Dataset Hypertuning¶
Here we will be hypertuning our top model trained on the 23x23 images, which is our Custom CNN trained on non-augmented data.
23x23 CNN non-augmented tuning¶
filters_block1 = [32, 64]
Why: The first convolutional block typically learns basic low-level features, such as edges, textures, and simple shapes.
32 filters help keep the model lightweight, which is crucial when working with smaller input images (e.g., 23x23 pixels) as it reduces computational overhead.
64 filters offer more capacity for feature extraction, which allows the network to learn richer representations, though it comes at the cost of increased memory usage and computation.
Balancing these values helps strike a good trade-off between model capacity (the ability to learn complex features) and the risk of overfitting, especially when dealing with small datasets.
filters_block2 = [64, 128]
Why: The second convolutional block captures more complex, higher-level patterns, like shapes and textures (e.g., the outline of a leaf or surface patterns).
After pooling, the spatial resolution of the feature maps reduces, so having larger filter sizes (128) allows the network to capture more abstract and complex features from the reduced input space.
Increasing the filter count here helps compensate for the reduction in spatial size while enabling the network to extract richer, more detailed representations of the input data.
dense_units = [64, 128]
Why: Determines the capacity of your fully connected classifier.
64 units are typically enough for simpler tasks or when the data is less complex, helping keep the model compact and efficient.
128 units provide more capacity, enabling the model to learn more detailed and sophisticated representations at the cost of additional computational resources.
Tuning this parameter allows the network to find a good balance between underfitting (too few units) and overfitting (too many units), helping the model generalize well to unseen data without becoming overly complex.
dropout_rate = [0.2, 0.3, 0.4, 0.5]
Why: Dropout is a regularization technique that helps prevent overfitting by randomly setting a fraction of the neurons to zero during training.
For small input sizes (like 23x23 images), dropout is particularly important because it reduces the chance of the model memorizing noisy or irrelevant patterns from the data.
Exploring a range of dropout rates (from mild to aggressive) allows you to find the right level of regularization. If the dropout rate is too low, the model might overfit, but if it's too high, the model might underfit and struggle to learn useful patterns.
l2_reg = log scale from 1e-5 to 1e-2
Why: L2 regularization penalizes large weights by adding a penalty to the loss function, which helps prevent overfitting by forcing the model to learn simpler, smaller weight values.
Logarithmic scale is used here because small changes in regularization strength (especially in lower values) can have a significant impact on model performance. It gives you finer control over the regularization process.
Choosing the right L2 regularization strength is crucial for preventing the model from becoming overly complex and fitting noise, while still allowing it to capture meaningful patterns.
Activation functions (relu, leaky_relu)
Why: The activation function introduces nonlinearity into the network, enabling it to learn complex patterns.
ReLU is the default choice because it's efficient and works well in many scenarios, but it can suffer from the “dying ReLU” problem, where neurons can become inactive during training (especially when the input is very small or negative).
Leaky ReLU mitigates this issue by allowing a small negative gradient when the input is less than zero, ensuring neurons continue to learn even if their activations are negative.
Tuning activation functions across convolutional and dense layers helps the model adapt to the nonlinearities in the data, especially for tasks with low-resolution images where learning fine-grained features is more challenging.
optimizer = ['adam', 'adamax', 'nadam', 'sgd']
Why: The optimizer controls how the model's weights are updated during training. Different optimizers behave differently, especially when dealing with small images.
Adam is a widely used baseline optimizer that combines adaptive learning rates with momentum, making it a reliable choice for many tasks.
Adamax is a variant of Adam that can sometimes perform better with sparse gradients, which may occur when working with images that have many flat regions (e.g., backgrounds).
Nadam combines Adam with Nesterov momentum, which can speed up convergence and sometimes lead to better performance.
SGD (Stochastic Gradient Descent) is a classic optimizer that often requires more tuning (learning rate, momentum), but can provide better generalization when used correctly.
Exploring these optimizers helps find the one that best suits the learning dynamics of small image tasks.
learning_rate = log scale from 1e-5 to 1e-2
Why: The learning rate controls the size of the steps the optimizer takes when updating model parameters.
A log scale is used because the learning rate has a large effect on training dynamics. Very small learning rates might lead to slow convergence, while too high a learning rate could cause the model to miss the optimal solution.
Capturing both slow and fast learning behaviors with this log scale helps find the optimal balance between stability (avoiding overshooting the minimum) and speed (converging efficiently).
def build_small_custom_cnn(hp):
filters_block1 = hp.Choice('filters_block1', values=[32, 64])
filters_block2 = hp.Choice('filters_block2', values=[64, 128])
dense_units = hp.Choice('dense_units', values=[64, 128])
dropout_rate = hp.Float('dropout_rate', min_value=0.2, max_value=0.5, step=0.1)
l2_reg = hp.Float('l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')
# Tunable activation functions
act_block1 = hp.Choice('activation_block1', values=['relu', 'leaky_relu'])
act_block2 = hp.Choice('activation_block2', values=['relu', 'leaky_relu'])
act_block3 = hp.Choice('activation_block3', values=['relu', 'leaky_relu'])
act_dense = hp.Choice('activation_dense', values=['relu', 'leaky_relu'])
# Tunable optimizer and learning rate
optimizer_choice = hp.Choice('optimizer', values=['adam', 'adamax', 'nadam', 'sgd'])
learning_rate = hp.Float('learning_rate', min_value=1e-5, max_value=1e-2, sampling='log')
# Build optimizer with learning rate
if optimizer_choice == 'adam':
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
elif optimizer_choice == 'adamax':
optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)
elif optimizer_choice == 'nadam':
optimizer = tf.keras.optimizers.Nadam(learning_rate=learning_rate)
elif optimizer_choice == 'sgd':
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
reg = tf.keras.regularizers.l2(l2_reg)
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(23, 23, 1)),
# Block 1
tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(dropout_rate),
# Block 2
tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(dropout_rate),
# Block 3
tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(dropout_rate),
# Classifier
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(dense_units, activation=act_dense, kernel_regularizer=reg),
tf.keras.layers.Dropout(dropout_rate),
tf.keras.layers.Dense(11, activation='softmax')
])
model.compile(
optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
return model
small_cnn_tuner = RandomSearch(
build_small_custom_cnn,
objective='val_accuracy',
max_trials=10,
directory='cnn_tuner',
project_name='small_non_augmented_cnn_5'
)
small_cnn_tuner.search(small_train,
validation_data=small_val,
epochs=20,
callbacks=[early_stop, reduce_lr])
Trial 10 Complete [00h 01m 21s] val_accuracy: 0.34909090399742126 Best val_accuracy So Far: 0.9527272582054138 Total elapsed time: 00h 15m 12s
Note: Hyperparameter tuning was conducted iteratively to optimize model performance. As a result, the hyperparameters presented here may differ from those used in subsequent stages, reflecting configurations that yielded better performance during earlier evaluations.
small_tuned_cnn = small_cnn_tuner.get_best_models(num_models=1)[0]
tuned_cnn_hyperparams = small_cnn_tuner.get_best_hyperparameters(1)[0]
print("Best hyperparameters:")
print(tuned_cnn_hyperparams.values)
Best hyperparameters:
{'filters_block1': 64, 'filters_block2': 128, 'dense_units': 128, 'dropout_rate': 0.4, 'l2_reg': 1.3179456160919495e-05, 'activation_block1': 'relu', 'activation_block2': 'relu', 'activation_block3': 'relu', 'activation_dense': 'relu', 'optimizer': 'adam', 'learning_rate': 0.0014127161702417554}
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 58 variables. saveable.load_own_variables(weights_store.get(inner_path))
101x101 Dataset Hypertuning¶
Here we will be hypertuning our top model that are trained on our 101x101 images, which is our Custom CNN trained on non-augmented data.
101x101 CNN non-augmented tuning¶
filters_block1 = [32, 64]
Why: The first convolutional block typically learns basic low-level features, such as edges, textures, and simple shapes.
32 filters help keep the model lightweight, which is crucial when working with smaller input images (e.g., 23x23 pixels) as it reduces computational overhead.
64 filters offer more capacity for feature extraction, which allows the network to learn richer representations, though it comes at the cost of increased memory usage and computation.
Balancing these values helps strike a good trade-off between model capacity (the ability to learn complex features) and the risk of overfitting, especially when dealing with small datasets.
filters_block2 = [64, 128]
Why: The second convolutional block captures more complex, higher-level patterns, like shapes and textures (e.g., the outline of a leaf or surface patterns).
After pooling, the spatial resolution of the feature maps reduces, so having larger filter sizes (128) allows the network to capture more abstract and complex features from the reduced input space.
Increasing the filter count here helps compensate for the reduction in spatial size while enabling the network to extract richer, more detailed representations of the input data.
dense_units = [64, 128]
Why: Determines the capacity of your fully connected classifier.
64 units are typically enough for simpler tasks or when the data is less complex, helping keep the model compact and efficient.
128 units provide more capacity, enabling the model to learn more detailed and sophisticated representations at the cost of additional computational resources.
Tuning this parameter allows the network to find a good balance between underfitting (too few units) and overfitting (too many units), helping the model generalize well to unseen data without becoming overly complex.
dropout_rate = [0.2, 0.3, 0.4, 0.5]
Why: Dropout is a regularization technique that helps prevent overfitting by randomly setting a fraction of the neurons to zero during training.
For small input sizes (like 23x23 images), dropout is particularly important because it reduces the chance of the model memorizing noisy or irrelevant patterns from the data.
Exploring a range of dropout rates (from mild to aggressive) allows you to find the right level of regularization. If the dropout rate is too low, the model might overfit, but if it's too high, the model might underfit and struggle to learn useful patterns.
l2_reg = log scale from 1e-5 to 1e-2
Why: L2 regularization penalizes large weights by adding a penalty to the loss function, which helps prevent overfitting by forcing the model to learn simpler, smaller weight values.
Logarithmic scale is used here because small changes in regularization strength (especially in lower values) can have a significant impact on model performance. It gives you finer control over the regularization process.
Choosing the right L2 regularization strength is crucial for preventing the model from becoming overly complex and fitting noise, while still allowing it to capture meaningful patterns.
Activation functions (relu, leaky_relu)
Why: The activation function introduces nonlinearity into the network, enabling it to learn complex patterns.
ReLU is the default choice because it's efficient and works well in many scenarios, but it can suffer from the “dying ReLU” problem, where neurons can become inactive during training (especially when the input is very small or negative).
Leaky ReLU mitigates this issue by allowing a small negative gradient when the input is less than zero, ensuring neurons continue to learn even if their activations are negative.
Tuning activation functions across convolutional and dense layers helps the model adapt to the nonlinearities in the data, especially for tasks with low-resolution images where learning fine-grained features is more challenging.
optimizer = ['adam', 'adamax', 'nadam', 'sgd']
Why: The optimizer controls how the model's weights are updated during training. Different optimizers behave differently, especially when dealing with small images.
Adam is a widely used baseline optimizer that combines adaptive learning rates with momentum, making it a reliable choice for many tasks.
Adamax is a variant of Adam that can sometimes perform better with sparse gradients, which may occur when working with images that have many flat regions (e.g., backgrounds).
Nadam combines Adam with Nesterov momentum, which can speed up convergence and sometimes lead to better performance.
SGD (Stochastic Gradient Descent) is a classic optimizer that often requires more tuning (learning rate, momentum), but can provide better generalization when used correctly.
Exploring these optimizers helps find the one that best suits the learning dynamics of small image tasks.
learning_rate = log scale from 1e-5 to 1e-2
Why: The learning rate controls the size of the steps the optimizer takes when updating model parameters.
A log scale is used because the learning rate has a large effect on training dynamics. Very small learning rates might lead to slow convergence, while too high a learning rate could cause the model to miss the optimal solution.
Capturing both slow and fast learning behaviors with this log scale helps find the optimal balance between stability (avoiding overshooting the minimum) and speed (converging efficiently).
def build_large_custom_cnn(hp):
filters_block1 = hp.Choice('filters_block1', values=[32, 64])
filters_block2 = hp.Choice('filters_block2', values=[64, 128])
dense_units = hp.Choice('dense_units', values=[64, 128])
dropout_rate = hp.Float('dropout_rate', min_value=0.2, max_value=0.5, step=0.1)
l2_reg = hp.Float('l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')
# Tunable activation functions
act_block1 = hp.Choice('activation_block1', values=['relu', 'leaky_relu'])
act_block2 = hp.Choice('activation_block2', values=['relu', 'leaky_relu'])
act_block3 = hp.Choice('activation_block3', values=['relu', 'leaky_relu'])
act_dense = hp.Choice('activation_dense', values=['relu', 'leaky_relu'])
# Tunable optimizer and learning rate
optimizer_choice = hp.Choice('optimizer', values=['adam', 'adamax', 'nadam', 'sgd'])
learning_rate = hp.Float('learning_rate', min_value=1e-5, max_value=1e-2, sampling='log')
# Build optimizer with learning rate
if optimizer_choice == 'adam':
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
elif optimizer_choice == 'adamax':
optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)
elif optimizer_choice == 'nadam':
optimizer = tf.keras.optimizers.Nadam(learning_rate=learning_rate)
elif optimizer_choice == 'sgd':
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
reg = tf.keras.regularizers.l2(l2_reg)
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(101, 101, 1)),
# Block 1
tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters_block1, (3, 3), activation=act_block1, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(dropout_rate),
# Block 2
tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters_block2, (3, 3), activation=act_block2, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(dropout_rate),
# Block 3
tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(128, (3, 3), activation=act_block3, padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(dropout_rate),
# Classifier
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(dense_units, activation=act_dense, kernel_regularizer=reg),
tf.keras.layers.Dropout(dropout_rate),
tf.keras.layers.Dense(11, activation='softmax')
])
model.compile(
optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
return model
large_cnn_tuner = RandomSearch(
build_large_custom_cnn,
objective='val_accuracy',
max_trials=5,
directory='cnn_tuner',
project_name='large_non_augmented_cnn_2'
)
large_cnn_tuner.search(large_train,
validation_data=large_val,
epochs=20,
callbacks=[early_stop, reduce_lr])
Trial 5 Complete [00h 10m 59s] val_accuracy: 0.4854545593261719 Best val_accuracy So Far: 0.9654545187950134 Total elapsed time: 01h 03m 42s
Note: Hyperparameter tuning was conducted iteratively to optimize model performance. As a result, the hyperparameters presented here may differ from those used in subsequent stages, reflecting configurations that yielded better performance during earlier evaluations.
large_tuned_cnn = large_cnn_tuner.get_best_models(num_models=1)[0]
tuned_cnn_hyperparams = large_cnn_tuner.get_best_hyperparameters(1)[0]
print("Best hyperparameters:")
print(tuned_cnn_hyperparams.values)
Best hyperparameters:
{'filters_block1': 32, 'filters_block2': 128, 'dense_units': 128, 'dropout_rate': 0.30000000000000004, 'l2_reg': 1.3442250945870623e-05, 'activation_block1': 'relu', 'activation_block2': 'relu', 'activation_block3': 'leaky_relu', 'activation_dense': 'leaky_relu', 'optimizer': 'nadam', 'learning_rate': 0.00040405637109264074}
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'nadam', because it has 2 variables whereas the saved optimizer has 59 variables. saveable.load_own_variables(weights_store.get(inner_path))
Best Model¶
Our early stopping callback for training our best models. We customized the start_from_epoch parameter, in order to allow our model to have a minimum amount of epochs to train for, and not stopping too early. This is because not training for a minimum of 20 epochs will make our model perform worse on the test set.
custom_early_stop = tf.keras.callbacks.EarlyStopping(
patience=5,
min_delta=0.0001,
restore_best_weights=True,
monitor='val_loss',
start_from_epoch=20
)
Here we define a function to plot the learning curves from the model's history.
def plot_history(history, title="CNN Model", metric_name='accuracy'):
# Ensure the history contains the correct keys for accuracy and loss
acc = history.history.get(metric_name, [])
val_acc = history.history.get(f'val_{metric_name}', [])
loss = history.history.get('loss', [])
val_loss = history.history.get('val_loss', [])
# Generate a range for the number of epochs
epochs_range = range(1, len(acc) + 1)
plt.figure(figsize=(14, 5))
# ----- Accuracy Plot ----- #
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.title(f'{title} - Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
# ----- Loss Plot ----- #
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.title(f'{title} - Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
23x23 Images¶
Best CNN¶
def build_small_custom_cnn_best():
filters_block1 = 64
filters_block2 = 64
dense_units = 64
dropout_rate = 0.2
l2_reg = 3.7224610062669776e-05
act_block1 = 'relu'
act_block2 = 'leaky_relu'
act_block3 = 'relu'
act_dense = 'relu'
learning_rate = 0.0017584971517999608
optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)
reg = tf.keras.regularizers.l2(l2_reg)
def get_activation(act_name):
if act_name == 'leaky_relu':
return tf.keras.layers.LeakyReLU()
else:
return tf.keras.layers.Activation(act_name)
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(23, 23, 1)),
# Block 1
tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block1),
tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block1),
tf.keras.layers.Dropout(dropout_rate),
# Block 2
tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block2),
tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block2),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(dropout_rate),
# Block 3
tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block3),
tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block3),
tf.keras.layers.Dropout(dropout_rate),
# Classifier
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(dense_units, kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_dense),
tf.keras.layers.Dropout(dropout_rate),
tf.keras.layers.Dense(11, activation='softmax')
])
model.compile(
optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
return model
best_small_cnn = build_small_custom_cnn_best()
best_small_cnn.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 23, 23, 64) │ 640 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ activation (Activation) │ (None, 23, 23, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_1 (Conv2D) │ (None, 23, 23, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_1 │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ activation_1 (Activation) │ (None, 23, 23, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 23, 23, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_2 (Conv2D) │ (None, 23, 23, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_2 │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ leaky_re_lu (LeakyReLU) │ (None, 23, 23, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_3 (Conv2D) │ (None, 23, 23, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_3 │ (None, 23, 23, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ leaky_re_lu_1 (LeakyReLU) │ (None, 23, 23, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 11, 11, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_1 (Dropout) │ (None, 11, 11, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_4 (Conv2D) │ (None, 11, 11, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_4 │ (None, 11, 11, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ activation_2 (Activation) │ (None, 11, 11, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_5 (Conv2D) │ (None, 11, 11, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_5 │ (None, 11, 11, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ activation_3 (Activation) │ (None, 11, 11, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_2 (Dropout) │ (None, 11, 11, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_6 │ (None, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ activation_4 (Activation) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_3 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 11) │ 715 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 344,139 (1.31 MB)
Trainable params: 342,987 (1.31 MB)
Non-trainable params: 1,152 (4.50 KB)
best_small_cnn_checkpoint = tf.keras.callbacks.ModelCheckpoint(
'best_small_cnn2.weights.h5', monitor='val_accuracy', save_best_only=True, save_weights_only=True, mode='max'
)
# Train
best_small_cnn_history = best_small_cnn.fit(
small_train,
validation_data=small_val,
epochs=30,
batch_size=32,
verbose=2,
callbacks=[custom_early_stop, reduce_lr, best_small_cnn_checkpoint]
)
Epoch 1/30 328/328 - 30s - 92ms/step - accuracy: 0.4362 - loss: 1.6755 - val_accuracy: 0.0914 - val_loss: 4.4623 - learning_rate: 0.0018 Epoch 2/30 328/328 - 3s - 8ms/step - accuracy: 0.6357 - loss: 1.1364 - val_accuracy: 0.3973 - val_loss: 2.1620 - learning_rate: 0.0018 Epoch 3/30 328/328 - 3s - 9ms/step - accuracy: 0.7156 - loss: 0.9051 - val_accuracy: 0.5168 - val_loss: 1.5474 - learning_rate: 0.0018 Epoch 4/30 328/328 - 3s - 9ms/step - accuracy: 0.7651 - loss: 0.7677 - val_accuracy: 0.6055 - val_loss: 1.1896 - learning_rate: 0.0018 Epoch 5/30 328/328 - 3s - 8ms/step - accuracy: 0.7956 - loss: 0.6607 - val_accuracy: 0.6886 - val_loss: 0.9974 - learning_rate: 0.0018 Epoch 6/30 328/328 - 3s - 8ms/step - accuracy: 0.8250 - loss: 0.5694 - val_accuracy: 0.6341 - val_loss: 1.4299 - learning_rate: 0.0018 Epoch 7/30 328/328 - 3s - 8ms/step - accuracy: 0.8445 - loss: 0.5181 - val_accuracy: 0.8432 - val_loss: 0.4943 - learning_rate: 0.0018 Epoch 8/30 328/328 - 5s - 16ms/step - accuracy: 0.8614 - loss: 0.4668 - val_accuracy: 0.7895 - val_loss: 0.6491 - learning_rate: 0.0018 Epoch 9/30 328/328 - 5s - 16ms/step - accuracy: 0.8747 - loss: 0.4154 - val_accuracy: 0.7145 - val_loss: 0.8420 - learning_rate: 0.0018 Epoch 10/30 Epoch 10: ReduceLROnPlateau reducing learning rate to 0.0008792486041784286. 328/328 - 3s - 8ms/step - accuracy: 0.8816 - loss: 0.3907 - val_accuracy: 0.6718 - val_loss: 1.0666 - learning_rate: 0.0018 Epoch 11/30 328/328 - 5s - 16ms/step - accuracy: 0.9217 - loss: 0.2811 - val_accuracy: 0.8873 - val_loss: 0.3458 - learning_rate: 8.7925e-04 Epoch 12/30 328/328 - 5s - 16ms/step - accuracy: 0.9295 - loss: 0.2547 - val_accuracy: 0.8709 - val_loss: 0.4351 - learning_rate: 8.7925e-04 Epoch 13/30 328/328 - 5s - 16ms/step - accuracy: 0.9393 - loss: 0.2314 - val_accuracy: 0.8423 - val_loss: 0.5182 - learning_rate: 8.7925e-04 Epoch 14/30 Epoch 14: ReduceLROnPlateau reducing learning rate to 0.0004396243020892143. 328/328 - 3s - 8ms/step - accuracy: 0.9444 - loss: 0.2153 - val_accuracy: 0.8305 - val_loss: 0.5314 - learning_rate: 8.7925e-04 Epoch 15/30 328/328 - 3s - 8ms/step - accuracy: 0.9568 - loss: 0.1772 - val_accuracy: 0.9200 - val_loss: 0.2902 - learning_rate: 4.3962e-04 Epoch 16/30 328/328 - 3s - 8ms/step - accuracy: 0.9625 - loss: 0.1621 - val_accuracy: 0.9027 - val_loss: 0.3391 - learning_rate: 4.3962e-04 Epoch 17/30 328/328 - 3s - 9ms/step - accuracy: 0.9639 - loss: 0.1535 - val_accuracy: 0.9168 - val_loss: 0.3148 - learning_rate: 4.3962e-04 Epoch 18/30 328/328 - 5s - 15ms/step - accuracy: 0.9669 - loss: 0.1471 - val_accuracy: 0.9227 - val_loss: 0.2854 - learning_rate: 4.3962e-04 Epoch 19/30 328/328 - 3s - 8ms/step - accuracy: 0.9690 - loss: 0.1386 - val_accuracy: 0.9364 - val_loss: 0.2359 - learning_rate: 4.3962e-04 Epoch 20/30 328/328 - 3s - 8ms/step - accuracy: 0.9701 - loss: 0.1376 - val_accuracy: 0.9355 - val_loss: 0.2347 - learning_rate: 4.3962e-04 Epoch 21/30 328/328 - 3s - 9ms/step - accuracy: 0.9700 - loss: 0.1345 - val_accuracy: 0.9282 - val_loss: 0.2506 - learning_rate: 4.3962e-04 Epoch 22/30 328/328 - 3s - 8ms/step - accuracy: 0.9736 - loss: 0.1221 - val_accuracy: 0.9350 - val_loss: 0.2506 - learning_rate: 4.3962e-04 Epoch 23/30 328/328 - 3s - 9ms/step - accuracy: 0.9750 - loss: 0.1220 - val_accuracy: 0.9395 - val_loss: 0.2317 - learning_rate: 4.3962e-04 Epoch 24/30 328/328 - 3s - 8ms/step - accuracy: 0.9762 - loss: 0.1195 - val_accuracy: 0.9532 - val_loss: 0.1915 - learning_rate: 4.3962e-04 Epoch 25/30 328/328 - 3s - 8ms/step - accuracy: 0.9772 - loss: 0.1142 - val_accuracy: 0.9386 - val_loss: 0.2225 - learning_rate: 4.3962e-04 Epoch 26/30 328/328 - 5s - 15ms/step - accuracy: 0.9799 - loss: 0.1062 - val_accuracy: 0.9059 - val_loss: 0.3116 - learning_rate: 4.3962e-04 Epoch 27/30 Epoch 27: ReduceLROnPlateau reducing learning rate to 0.00021981215104460716. 328/328 - 3s - 8ms/step - accuracy: 0.9767 - loss: 0.1091 - val_accuracy: 0.9491 - val_loss: 0.2079 - learning_rate: 4.3962e-04 Epoch 28/30 328/328 - 3s - 8ms/step - accuracy: 0.9819 - loss: 0.0967 - val_accuracy: 0.9468 - val_loss: 0.2196 - learning_rate: 2.1981e-04 Epoch 29/30 328/328 - 5s - 16ms/step - accuracy: 0.9848 - loss: 0.0922 - val_accuracy: 0.9582 - val_loss: 0.1832 - learning_rate: 2.1981e-04 Epoch 30/30 328/328 - 3s - 8ms/step - accuracy: 0.9873 - loss: 0.0872 - val_accuracy: 0.9541 - val_loss: 0.2007 - learning_rate: 2.1981e-04
Learning Curve of model¶
plot_history(best_small_cnn_history)
Insights from Learning Curve:¶
Good Generalization:
By epoch 15, training and validation accuracy both plateau around 95%. Validation loss settles low and roughly tracks training loss, so we're not massively over- or under-fitting.Early-epoch noise:
Noticeable spikes in validation loss around epochs 7 and 10 (and the corresponding dips in val accuracy).Smooth convergence later:
After ~15 epochs, both curves settle into a smooth decline of loss and steady rise of accuracy. This suggests our network capacity is sufficient for the task, and it eventually "absorbs" the augmentation noise.
101x101 Images¶
Best CNN¶
def build_best_large_custom_cnn():
# Best hyperparameters
filters_block1 = 64
filters_block2 = 64
dense_units = 128
dropout_rate = 0.2
l2_reg = 0.0002833715647014918
act_block1 = 'relu'
act_block2 = 'relu'
act_block3 = 'relu'
act_dense = 'relu'
optimizer_choice = 'adam'
learning_rate = 0.0010643000310335154
def get_activation(act_name):
if act_name == 'relu':
return tf.keras.layers.ReLU()
elif act_name == 'leaky_relu':
return tf.keras.layers.LeakyReLU()
elif act_name == 'elu':
return tf.keras.layers.ELU()
else:
raise ValueError(f"Unsupported activation: {act_name}")
# Optimizer
if optimizer_choice == 'adam':
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
elif optimizer_choice == 'adamax':
optimizer = tf.keras.optimizers.Adamax(learning_rate=learning_rate)
elif optimizer_choice == 'nadam':
optimizer = tf.keras.optimizers.Nadam(learning_rate=learning_rate)
elif optimizer_choice == 'sgd':
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
else:
raise ValueError(f"Unsupported optimizer: {optimizer_choice}")
reg = tf.keras.regularizers.l2(l2_reg)
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(101, 101, 1)),
# Block 1
tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block1),
tf.keras.layers.Conv2D(filters_block1, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block1),
tf.keras.layers.Dropout(dropout_rate),
# Block 2
tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block2),
tf.keras.layers.Conv2D(filters_block2, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block2),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(dropout_rate),
# Block 3
tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block3),
tf.keras.layers.Conv2D(128, (3, 3), padding='same', kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_block3),
tf.keras.layers.Dropout(dropout_rate),
# Classifier
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(dense_units, kernel_regularizer=reg),
tf.keras.layers.BatchNormalization(),
get_activation(act_dense),
tf.keras.layers.Dropout(dropout_rate),
tf.keras.layers.Dense(11, activation='softmax')
])
model.compile(
optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
return model
best_large_cnn = build_best_large_custom_cnn()
best_large_cnn.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_6 (Conv2D) │ (None, 101, 101, 64) │ 640 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_7 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ re_lu_7 (ReLU) │ (None, 101, 101, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_7 (Conv2D) │ (None, 101, 101, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_8 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ re_lu_8 (ReLU) │ (None, 101, 101, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_4 (Dropout) │ (None, 101, 101, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_8 (Conv2D) │ (None, 101, 101, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_9 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ re_lu_9 (ReLU) │ (None, 101, 101, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_9 (Conv2D) │ (None, 101, 101, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_10 │ (None, 101, 101, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ re_lu_10 (ReLU) │ (None, 101, 101, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 50, 50, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_5 (Dropout) │ (None, 50, 50, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_10 (Conv2D) │ (None, 50, 50, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_11 │ (None, 50, 50, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ re_lu_11 (ReLU) │ (None, 50, 50, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_11 (Conv2D) │ (None, 50, 50, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_12 │ (None, 50, 50, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ re_lu_12 (ReLU) │ (None, 50, 50, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_6 (Dropout) │ (None, 50, 50, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ global_average_pooling2d_1 │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 128) │ 16,512 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization_13 │ (None, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ re_lu_13 (ReLU) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_7 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_3 (Dense) │ (None, 11) │ 1,419 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 353,355 (1.35 MB)
Trainable params: 352,075 (1.34 MB)
Non-trainable params: 1,280 (5.00 KB)
best_large_cnn_checkpoint = tf.keras.callbacks.ModelCheckpoint(
'best_large_cnn_3.weights.h5', monitor='val_accuracy', save_best_only=True, save_weights_only=True, mode='max'
)
best_large_cnn_history = best_large_cnn.fit(
large_train,
validation_data=large_val,
epochs=30,
batch_size=32,
verbose=2,
callbacks=[custom_early_stop, reduce_lr, best_large_cnn_checkpoint]
)
Epoch 1/30 328/328 - 53s - 162ms/step - accuracy: 0.4657 - loss: 1.7233 - val_accuracy: 0.0859 - val_loss: 6.1868 - learning_rate: 0.0011 Epoch 2/30 328/328 - 33s - 102ms/step - accuracy: 0.6957 - loss: 1.0842 - val_accuracy: 0.4155 - val_loss: 2.7520 - learning_rate: 0.0011 Epoch 3/30 328/328 - 34s - 102ms/step - accuracy: 0.7848 - loss: 0.8309 - val_accuracy: 0.2827 - val_loss: 5.8604 - learning_rate: 0.0011 Epoch 4/30 328/328 - 34s - 102ms/step - accuracy: 0.8457 - loss: 0.6619 - val_accuracy: 0.4227 - val_loss: 3.2960 - learning_rate: 0.0011 Epoch 5/30 328/328 - 34s - 102ms/step - accuracy: 0.8756 - loss: 0.5554 - val_accuracy: 0.6777 - val_loss: 1.4370 - learning_rate: 0.0011 Epoch 6/30 328/328 - 34s - 102ms/step - accuracy: 0.9024 - loss: 0.4844 - val_accuracy: 0.4268 - val_loss: 2.9050 - learning_rate: 0.0011 Epoch 7/30 328/328 - 33s - 102ms/step - accuracy: 0.9137 - loss: 0.4481 - val_accuracy: 0.3232 - val_loss: 5.0418 - learning_rate: 0.0011 Epoch 8/30 Epoch 8: ReduceLROnPlateau reducing learning rate to 0.0005321500357240438. 328/328 - 34s - 104ms/step - accuracy: 0.9224 - loss: 0.4185 - val_accuracy: 0.5068 - val_loss: 2.0721 - learning_rate: 0.0011 Epoch 9/30 328/328 - 34s - 102ms/step - accuracy: 0.9555 - loss: 0.3108 - val_accuracy: 0.8077 - val_loss: 0.7781 - learning_rate: 5.3215e-04 Epoch 10/30 328/328 - 33s - 102ms/step - accuracy: 0.9617 - loss: 0.2853 - val_accuracy: 0.9359 - val_loss: 0.3511 - learning_rate: 5.3215e-04 Epoch 11/30 328/328 - 33s - 102ms/step - accuracy: 0.9654 - loss: 0.2697 - val_accuracy: 0.7932 - val_loss: 0.7650 - learning_rate: 5.3215e-04 Epoch 12/30 328/328 - 33s - 102ms/step - accuracy: 0.9667 - loss: 0.2645 - val_accuracy: 0.7636 - val_loss: 1.1191 - learning_rate: 5.3215e-04 Epoch 13/30 Epoch 13: ReduceLROnPlateau reducing learning rate to 0.0002660750178620219. 328/328 - 34s - 104ms/step - accuracy: 0.9646 - loss: 0.2598 - val_accuracy: 0.7914 - val_loss: 0.8200 - learning_rate: 5.3215e-04 Epoch 14/30 328/328 - 41s - 124ms/step - accuracy: 0.9844 - loss: 0.1980 - val_accuracy: 0.9832 - val_loss: 0.1972 - learning_rate: 2.6608e-04 Epoch 15/30 328/328 - 33s - 102ms/step - accuracy: 0.9877 - loss: 0.1828 - val_accuracy: 0.8805 - val_loss: 0.4814 - learning_rate: 2.6608e-04 Epoch 16/30 328/328 - 33s - 102ms/step - accuracy: 0.9881 - loss: 0.1785 - val_accuracy: 0.9068 - val_loss: 0.4690 - learning_rate: 2.6608e-04 Epoch 17/30 Epoch 17: ReduceLROnPlateau reducing learning rate to 0.00013303750893101096. 328/328 - 34s - 104ms/step - accuracy: 0.9873 - loss: 0.1747 - val_accuracy: 0.8141 - val_loss: 0.6541 - learning_rate: 2.6608e-04 Epoch 18/30 328/328 - 40s - 123ms/step - accuracy: 0.9933 - loss: 0.1534 - val_accuracy: 0.9805 - val_loss: 0.1749 - learning_rate: 1.3304e-04 Epoch 19/30 328/328 - 34s - 104ms/step - accuracy: 0.9951 - loss: 0.1447 - val_accuracy: 0.9914 - val_loss: 0.1501 - learning_rate: 1.3304e-04 Epoch 20/30 328/328 - 33s - 102ms/step - accuracy: 0.9943 - loss: 0.1421 - val_accuracy: 0.9850 - val_loss: 0.1710 - learning_rate: 1.3304e-04 Epoch 21/30 328/328 - 41s - 126ms/step - accuracy: 0.9957 - loss: 0.1374 - val_accuracy: 0.9705 - val_loss: 0.2098 - learning_rate: 1.3304e-04 Epoch 22/30 Epoch 22: ReduceLROnPlateau reducing learning rate to 6.651875446550548e-05. 328/328 - 33s - 102ms/step - accuracy: 0.9966 - loss: 0.1317 - val_accuracy: 0.9505 - val_loss: 0.2438 - learning_rate: 1.3304e-04 Epoch 23/30 328/328 - 33s - 102ms/step - accuracy: 0.9975 - loss: 0.1240 - val_accuracy: 0.9909 - val_loss: 0.1313 - learning_rate: 6.6519e-05 Epoch 24/30 328/328 - 34s - 102ms/step - accuracy: 0.9988 - loss: 0.1201 - val_accuracy: 0.9923 - val_loss: 0.1334 - learning_rate: 6.6519e-05 Epoch 25/30 328/328 - 34s - 102ms/step - accuracy: 0.9983 - loss: 0.1182 - val_accuracy: 0.9945 - val_loss: 0.1249 - learning_rate: 6.6519e-05 Epoch 26/30 328/328 - 33s - 102ms/step - accuracy: 0.9985 - loss: 0.1158 - val_accuracy: 0.9927 - val_loss: 0.1288 - learning_rate: 6.6519e-05 Epoch 27/30 328/328 - 33s - 102ms/step - accuracy: 0.9995 - loss: 0.1118 - val_accuracy: 0.9905 - val_loss: 0.1361 - learning_rate: 6.6519e-05 Epoch 28/30 328/328 - 34s - 104ms/step - accuracy: 0.9979 - loss: 0.1119 - val_accuracy: 0.9918 - val_loss: 0.1204 - learning_rate: 6.6519e-05 Epoch 29/30 328/328 - 41s - 124ms/step - accuracy: 0.9986 - loss: 0.1089 - val_accuracy: 0.9895 - val_loss: 0.1281 - learning_rate: 6.6519e-05 Epoch 30/30 328/328 - 33s - 102ms/step - accuracy: 0.9985 - loss: 0.1069 - val_accuracy: 0.9873 - val_loss: 0.1360 - learning_rate: 6.6519e-05
plot_history(best_large_cnn_history)
Insights from Learning Curve:¶
Slower ramp-up:
Low starting accuracy (~6%) and very high validation loss (~8) in epoch 1 shows your model "feels" the full-sized images are a harder learning problem than tiny 23x23 patches. It takes until roughly epoch 10 before both training and validation accuracy crack 80%.Mid-training volatility:
Big swings in validation loss (spikes at epochs 3, 6, 11, 16, 25) and corresponding dips in val accuracy.Convergence and plateau:
After epoch 20, both curves smooth out: validation accuracy edges into the high-90s and validation loss drops below 0.2. Indicates our model eventually "learns through" the extra variability. Also converges nicer at the later epochs, with the validation accuracy almost the same as the training accuracy.
Model Evaluation¶
small_test = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Dataset for CA1 part A - AY2526S1/test",
color_mode="grayscale",
batch_size=32,
image_size=(23,23),
shuffle=True,
seed=123
)
large_test = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Dataset for CA1 part A - AY2526S1/test",
color_mode="grayscale",
batch_size=32,
image_size=(101, 101),
shuffle=True,
seed=123,
labels='inferred',
label_mode="int"
)
small_test = small_test.map(normalize_img)
large_test = large_test.map(normalize_img)
Found 2200 files belonging to 11 classes. Found 2200 files belonging to 11 classes.
# Evaluate model on the test dataset
small_cnn_loss, small_cnn_accuracy = best_small_cnn.evaluate(small_test)
print("CNN Test accuracy:", small_cnn_accuracy)
69/69 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - accuracy: 0.9611 - loss: 0.1566 CNN Test accuracy: 0.9595454335212708
# Evaluate model on the test dataset
large_cnn_loss, large_cnn_accuracy = best_large_cnn.evaluate(large_test)
print("CNN Test accuracy:", large_cnn_accuracy)
69/69 ━━━━━━━━━━━━━━━━━━━━ 3s 36ms/step - accuracy: 0.9892 - loss: 0.1369 CNN Test accuracy: 0.9890909194946289
We can observe that the CNN trained and tested on 101x101 input performs better than the CNN trained on 23x23 input. We will further discuss this in our conclusion.
Model's Weights¶
We reload the model's weights to confirm their functionality and also to ease reproducibility in the following sections.
best_small_cnn = build_small_custom_cnn_best()
best_small_cnn.load_weights('/content/drive/MyDrive/Datasets/best_small_cnn1.weights.h5')
test_loss, test_accuracy = best_small_cnn.evaluate(small_test)
print(f"Test accuracy after loading weights: {test_accuracy:.4f}")
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'adamax', because it has 2 variables whereas the saved optimizer has 62 variables. saveable.load_own_variables(weights_store.get(inner_path))
69/69 ━━━━━━━━━━━━━━━━━━━━ 6s 32ms/step - accuracy: 0.9588 - loss: 0.1557 Test accuracy after loading weights: 0.9600
best_large_cnn = build_best_large_custom_cnn()
best_large_cnn.load_weights('/content/drive/MyDrive/Datasets/best_large_cnn_3.weights.h5')
test_loss, test_accuracy = best_large_cnn.evaluate(large_test)
print(f"Test accuracy after loading weights: {test_accuracy:.4f}")
/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 62 variables. saveable.load_own_variables(weights_store.get(inner_path))
69/69 ━━━━━━━━━━━━━━━━━━━━ 6s 54ms/step - accuracy: 0.9894 - loss: 0.1405 Test accuracy after loading weights: 0.9895
Model Metrics¶
Confusion matrices, and model layers will be visualized in this section.
def plot_confusion_matrix(name, model, test_dataset, class_names):
# Get true labels and predictions
y_true = []
y_pred = []
for X_batch, y_batch in test_dataset:
# Check if y_batch is one-hot encoded or integer labels
if len(y_batch.shape) == 2: # One-hot encoded labels (batch_size, num_classes)
y_true.extend(np.argmax(y_batch.numpy(), axis=1))
else: # Integer labels (batch_size,)
y_true.extend(y_batch.numpy())
# Get predictions
y_pred_probs = model.predict(X_batch, verbose=0)
y_pred.extend(np.argmax(y_pred_probs, axis=1)) # Predicted class indices
y_true = np.array(y_true)
y_pred = np.array(y_pred)
# Compute the confusion matrix
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_names)
# Plot the confusion matrix
fig, ax = plt.subplots(figsize=(12, 10))
disp.plot(cmap=plt.cm.Blues, ax=ax, xticks_rotation=90)
plt.title(f"Confusion Matrix: {name}", fontsize=16)
plt.tight_layout()
plt.show()
# Class-wise Accuracy
print("Class-wise Accuracy:\n")
total_per_class = cm.sum(axis=1)
correct_per_class = np.diag(cm)
for i, class_name in enumerate(class_names):
acc = correct_per_class[i] / total_per_class[i]
print(f"{class_name:30}: {acc:.2%}")
# Overall Accuracy
overall_acc = np.sum(correct_per_class) / np.sum(cm)
print(f"\nOverall Accuracy: {overall_acc:.2%}")
print('-'*100)
Confusion Matrix for the 23x23 Resnet model¶
plot_confusion_matrix("23x23 CNN Model", best_small_cnn, small_test, class_names=class_names)
Class-wise Accuracy: Capsicum : 98.00% Tomato : 95.00% Bitter_Gourd : 95.00% Pumpkin : 93.50% Bean : 98.50% Brinjal : 93.00% Cabbage : 99.50% Cucumber and Bottle_Gourd : 95.50% Radish and Carrot : 94.50% Potato : 98.50% Cauliflower and Broccoli : 95.00% Overall Accuracy: 96.00% ----------------------------------------------------------------------------------------------------
# Collect true and predicted labels
y_true = []
y_pred = []
for images, labels in small_test:
predictions = best_small_cnn.predict(images, verbose=0)
predicted_labels = np.argmax(predictions, axis=1)
y_true.extend(labels.numpy())
y_pred.extend(predicted_labels)
# Generate classification report
report = classification_report(y_true, y_pred, target_names=class_names)
print("Classification Report for the 23x23 Model:\n")
print(report)
Classification Report:
precision recall f1-score support
Bean 0.96 0.98 0.97 200
Bitter_Gourd 0.98 0.95 0.97 200
Brinjal 0.95 0.95 0.95 200
Cabbage 0.96 0.94 0.95 200
Capsicum 0.99 0.98 0.99 200
Cauliflower and Broccoli 0.90 0.93 0.92 200
Cucumber and Bottle_Gourd 0.95 0.99 0.97 200
Potato 0.97 0.95 0.96 200
Pumpkin 0.98 0.94 0.96 200
Radish and Carrot 0.98 0.98 0.98 200
Tomato 0.94 0.95 0.95 200
accuracy 0.96 2200
macro avg 0.96 0.96 0.96 2200
weighted avg 0.96 0.96 0.96 2200
Confusion Matrix for the 101x101 best model¶
plot_confusion_matrix("101x101 CNN Model", best_large_cnn, large_test, class_names=class_names)
Class-wise Accuracy: Capsicum : 99.50% Tomato : 99.00% Bitter_Gourd : 97.50% Pumpkin : 99.00% Bean : 99.50% Brinjal : 98.00% Cabbage : 100.00% Cucumber and Bottle_Gourd : 98.00% Radish and Carrot : 99.50% Potato : 99.50% Cauliflower and Broccoli : 99.00% Overall Accuracy: 98.95% ----------------------------------------------------------------------------------------------------
# Collect true and predicted labels
y_true = []
y_pred = []
for images, labels in large_test:
predictions = best_large_cnn.predict(images, verbose=0)
predicted_labels = np.argmax(predictions, axis=1)
y_true.extend(labels.numpy())
y_pred.extend(predicted_labels)
# Generate classification report
report = classification_report(y_true, y_pred, target_names=class_names)
print("Classification Report for the 101x101 Model:\n")
print(report)
Classification Report for the 101x101 Model:
precision recall f1-score support
Bean 0.99 0.99 0.99 200
Bitter_Gourd 1.00 0.99 0.99 200
Brinjal 0.98 0.97 0.98 200
Cabbage 0.98 0.99 0.98 200
Capsicum 1.00 0.99 1.00 200
Cauliflower and Broccoli 0.98 0.98 0.98 200
Cucumber and Bottle_Gourd 0.97 1.00 0.98 200
Potato 1.00 0.98 0.99 200
Pumpkin 0.99 0.99 0.99 200
Radish and Carrot 1.00 0.99 1.00 200
Tomato 0.99 0.99 0.99 200
accuracy 0.99 2200
macro avg 0.99 0.99 0.99 2200
weighted avg 0.99 0.99 0.99 2200
Insights from the Confusion Matrices¶
23x23 model: We see quite a few off-diagonal cells (e.g. Brinjal -> Capsicum, Pumpkin -> Brinjal, Cauliflower&Broccoli -> Potato, etc.). Overall accuracy is high, but there's still noticeable “bleed” between visually similar classes.
101x101 model: The vast majority of predictions land on the diagonal. Only a handful of mistakes remain (e.g. Bitter Gourd -> Bottle Gourd, Cauliflower&Broccoli -> Potato, Brinjal -> Pumpkin), and even those are mostly just 1-3 images per class.
Insights: More resolution preserves fine-grained textures and shape cues, letting the network disambiguate pumpkins and capsicums for instance, much more reliably.
On small inputs, any shapes or textures that overlap between “pairs” of vegetables (e.g. the rough rind of both pumpkin and bitter gourd) become easy to confuse.
On larger inputs, those same cues (vein patterns, stalk shapes, surface texture) become distinct again, so the model almost never mistakes one for the other.
What this tells us:
Spatial detail matters: At 23x23, we're sometimes down to a handful of pixels for a leaf edge or color gradient; at 101x101, those features are much richer.
Error Analysis¶
Here, we will view the images that our model got wrong for analysis.
def error_analyze(class_names, model, test):
# 1. Extract all test images and labels
X_test = []
y_test = []
for images, labels in test:
X_test.extend(images.numpy()) # convert to NumPy
y_test.extend(labels.numpy())
X_test = np.array(X_test)
y_test = np.array(y_test)
# 2. Make predictions
y_pred_probs = model.predict(X_test)
y_pred_classes = np.argmax(y_pred_probs, axis=1)
# 3. Identify misclassified indices
wrong_indices = np.where(y_pred_classes != y_test)[0]
# 4. Show top-2 predictions for some wrong predictions
N = 5
rows = []
for idx in wrong_indices[:N]:
probs = y_pred_probs[idx]
top2 = probs.argsort()[-2:][::-1] # descending top 2
rows.append({
'Index': idx,
'Actual Label': y_test[idx],
'Actual Class': class_names[y_test[idx]],
'Predicted Label': y_pred_classes[idx],
'Predicted Class': class_names[y_pred_classes[idx]],
'Top-1 Class': class_names[top2[0]],
'Top-1 Prob': probs[top2[0]],
'Top-2 Class': class_names[top2[1]],
'Top-2 Prob': probs[top2[1]],
})
top_errors = pd.DataFrame(rows)
display(top_errors)
return (X_test, y_test, rows)
class_names = sorted(os.listdir("/content/Dataset for CA1 part A - AY2526S1/train"))
Error Analysis for 23x23 Model¶
X_test_small, y_test_small, rows_small = error_analyze(class_names, best_small_cnn, small_test)
69/69 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
| Index | Actual Label | Actual Class | Predicted Label | Predicted Class | Top-1 Class | Top-1 Prob | Top-2 Class | Top-2 Prob | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 91 | 1 | Bitter_Gourd | 3 | Cabbage | Cabbage | 0.805765 | Bitter_Gourd | 0.126022 |
| 1 | 101 | 4 | Capsicum | 3 | Cabbage | Cabbage | 0.513060 | Capsicum | 0.386378 |
| 2 | 106 | 5 | Cauliflower and Broccoli | 6 | Cucumber and Bottle_Gourd | Cucumber and Bottle_Gourd | 0.666534 | Cauliflower and Broccoli | 0.267308 |
| 3 | 143 | 2 | Brinjal | 3 | Cabbage | Cabbage | 0.663792 | Brinjal | 0.130903 |
| 4 | 145 | 10 | Tomato | 9 | Radish and Carrot | Radish and Carrot | 0.605660 | Cauliflower and Broccoli | 0.115429 |
N = len(rows_small)
cols = min(N, 5) # up to 5 images per row
rows_needed = (N + cols - 1) // cols
fig, axes = plt.subplots(rows_needed, cols, figsize=(cols * 3, rows_needed * 3))
if rows_needed == 1:
axes = np.expand_dims(axes, axis=0) # make 2D if only 1 row
for i, row in enumerate(rows_small):
r, c = divmod(i, cols)
ax = axes[r][c]
img = X_test_small[row['Index']].squeeze()
ax.imshow(img, cmap='gray')
actual = class_names[row['Actual Label']]
pred = class_names[row['Predicted Label']]
top1 = row['Top-1 Class']
top2 = row['Top-2 Class']
ax.set_title(f"True: {actual}\nPred: {pred}\n1: {top1} ({row['Top-1 Prob']:.2f})\n2: {top2} ({row['Top-2 Prob']:.2f})", fontsize=8)
ax.axis('off')
# Hide any unused subplots
for i in range(N, rows_needed * cols):
fig.delaxes(axes.flatten()[i])
plt.tight_layout()
plt.show()
First image: We can see that the images's true label is a bitter gourd, however, the model was 83% sure it was cabbage, and 13% sure it was bitter gourd. To the human eye, it is understandable why the model thinks so. The circular shape with the white center shows features similar to that of a cabbage, hence it is a reasonable prediction. Futhermore, the model was able to slightly guess that it was a bitter gourd, with a 13% confidence.
Second image: The model closely predicts between Cabbage and Capsicum. They both tend to have the same round and circular shape, and it is a reasonable prediction.
Third image: To the human eye, it is completely indistunguishable. There are no specific shapes to hint that this is a Cauliflower and Broccoli. It is impressive that the model is able to guess the correct class with a 27% confidence. This shows that our model has potential.
Fourth image: The brinjals in the dataset have different shapes and sizes. For this brinjal, we can see that it follows a circular, smooth round shape. It is a reasonable prediction of our model to predict Cabbage. Furthermore, it was able to guess brinjal with a 13% confidence rate. It is somewhat impressive.
Lastly, the last image was a Tomato. However, it is shaped like a Radish. Even I would have guessed a Radish or Carrot. Therefore it is a reasonable guess in my opinion.
Error Analysis for 101x101 Model¶
X_test_large, y_test_large, rows_large = error_analyze(class_names, best_large_cnn, large_test)
69/69 ━━━━━━━━━━━━━━━━━━━━ 3s 36ms/step
| Index | Actual Label | Actual Class | Predicted Label | Predicted Class | Top-1 Class | Top-1 Prob | Top-2 Class | Top-2 Prob | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 164 | 1 | Bitter_Gourd | 3 | Cabbage | Cabbage | 0.918178 | Bitter_Gourd | 0.024346 |
| 1 | 233 | 3 | Cabbage | 6 | Cucumber and Bottle_Gourd | Cucumber and Bottle_Gourd | 0.909914 | Cabbage | 0.085638 |
| 2 | 298 | 7 | Potato | 6 | Cucumber and Bottle_Gourd | Cucumber and Bottle_Gourd | 0.490984 | Potato | 0.203611 |
| 3 | 361 | 5 | Cauliflower and Broccoli | 3 | Cabbage | Cabbage | 0.804141 | Cauliflower and Broccoli | 0.165774 |
| 4 | 363 | 8 | Pumpkin | 5 | Cauliflower and Broccoli | Cauliflower and Broccoli | 0.818096 | Pumpkin | 0.122512 |
N = len(rows_large)
cols = min(N, 5) # up to 5 images per row
rows_needed = (N + cols - 1) // cols
fig, axes = plt.subplots(rows_needed, cols, figsize=(cols * 3, rows_needed * 3))
if rows_needed == 1:
axes = np.expand_dims(axes, axis=0) # make 2D if only 1 row
for i, row in enumerate(rows_large):
r, c = divmod(i, cols)
ax = axes[r][c]
img = X_test_large[row['Index']].squeeze()
ax.imshow(img, cmap='gray')
actual = class_names[row['Actual Label']]
pred = class_names[row['Predicted Label']]
top1 = row['Top-1 Class']
top2 = row['Top-2 Class']
ax.set_title(f"True: {actual}\nPred: {pred}\n1: {top1} ({row['Top-1 Prob']:.2f})\n2: {top2} ({row['Top-2 Prob']:.2f})", fontsize=8)
ax.axis('off')
# Hide any unused subplots
for i in range(N, rows_needed * cols):
fig.delaxes(axes.flatten()[i])
plt.tight_layout()
plt.show()
First image: The basket holding the bitter gourds has a similar texture to cabbages. This may introduce noise to our model, which would cause it to predict unexpectedly. It is a justificable error to predict cabbage.
Second image: The model predicts a cucumber or bottle gourd, however, it is clearly a cabbage. Hence, this is a unreasonable error, and the model predicted this one poorly.
Third image: The potatos and cucumbers and bottle gourds carry similar structure, and our model might end up predicting either class wrongly. Hence, it is a reasonable error.
Fourth image: We can see a cauliflower, however, there is unnecessary noise in the background, which can lead to poor model predictions. Hence, it is a reasonable error.
Lastly, even though we can clearly see a pumpkin, we can observe a certain texture in the background which can lead to possible predictions such as cauliflower due to similar texture. Hence, this is a reasonable error.
Model Architecture¶
tf.keras.utils.plot_model(best_small_cnn, show_shapes=True, show_layer_names=True, dpi=70, to_file='small_model_architecture.png')
# Use a smaller font for better layout
try:
font = ImageFont.truetype("arial.ttf", 12)
except:
font = None
# Render the layered view and display it
image = visualkeras.layered_view(best_small_cnn, legend=True, draw_volume=True, font=font)
display(image)
/usr/local/lib/python3.11/dist-packages/visualkeras/layered.py:86: UserWarning: The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.
warnings.warn("The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.")
tf.keras.utils.plot_model(best_large_cnn, show_shapes=True, show_layer_names=True, dpi=70, to_file='large_model_architecture.png')
# Use a smaller font for better layout
try:
font = ImageFont.truetype("arial.ttf", 12)
except:
font = None
# Render the layered view and display it
image = visualkeras.layered_view(best_large_cnn, legend=True, draw_volume=True, font=font)
display(image)
/usr/local/lib/python3.11/dist-packages/visualkeras/layered.py:86: UserWarning: The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.
warnings.warn("The legend_text_spacing_offset parameter is deprecated and will be removed in a future release.")
Comparing the classification accuracies of both 23x23 and 101x101 models.¶
We can observe that the model for the 23x23 images had an accuracy of about 95-96%, whereas the model for the 101x101 images had an accuracy of about 99%.
*(may vary a bit due to rerunning the code)
Why does the model trained and tested on 101x101 images perform better?¶
- Higher Spatial Resolution.
Higher spatial resolution typically means more informative features. Fine patterns, edges, or textures that help distinguish between classes may get lost during aggressive downscaling.
At 23x23, a single convolution kernel might cover a large proportion of the object, reducing the model's ability to detect localized, discriminative features.
- Better Generalization
When the input is rich in detail, the model learns more generalizable and discriminative features, leading to improved accuracy on unseen data.
23x23 inputs might lead the model to memorize coarse features (e.g., object shape) but miss nuances (e.g., texture, borders).
Insights¶
Model trained on 23x23 Images¶
- Advantages
The model trained with 23x23 pixel images exhibits a significantly faster training time compared to the model trained with 101x101 images. This is largely due to the smaller input size, which requires fewer computations for both forward and backward passes during training.
This makes the 23x23 model a more attractive option for scenarios where quick model iteration is necessary such as real-time computations. Moreover, the smaller model size also implies that the 23x23 resolution would be more suitable for deployment on edge devices, such as mobile phones or IoT devices, which often have limited processing power and memory.
- Disadvantages
However, while the 23x23 model performs quite efficiently in terms of speed, there is a noticeable trade-off in accuracy. The 95% accuracy is respectable, but it comes at the cost of losing fine-grained information available in higher resolution images.
The relatively lower accuracy may not be a significant issue in many real-time applications, such as face recognition, object detection in low-resource environments, or simple classification tasks where a small error margin is acceptable. However, for more sensitive applications, the decrease in accuracy could lead to undesirable outcomes.
Model Trained on 101x101 Images¶
- Advantages
On the other hand, the model trained with 101x101 images achieves an accuracy of 99%, which is clearly superior in terms of predictive power. The increased resolution allows the model to capture more fine-grained details, which can be crucial in domains requiring high precision.
For example, in medical AI applications, such as cancer detection or diagnostic imaging, the ability to discern subtle patterns or abnormalities in high-resolution images can be the difference between an accurate diagnosis and a false one. In such cases, sacrificing accuracy for speed or resource efficiency would not be acceptable.
- Disadvantages
The trade-off, however, is that the 101x101 model requires considerably more computational resources, both in terms of memory and processing power. Larger input sizes increase the complexity of the model, leading to longer training times and higher resource consumption during inference. For deployment on devices with limited computational capabilities, this could pose significant challenges.
Additionally, the increased memory requirements for higher-resolution images could limit the scalability of the model when dealing with large datasets.
To summarize:¶
The choice between 23x23 and 101x101 image resolutions is a balancing act between speed and accuracy.
For real-time applications and deployment on edge devices, the 23x23 resolution may be more practical despite the slight accuracy loss.
Conversely, when high precision is crucial, such as in medical AI or high-stakes industrial applications, the 101x101 resolution offers superior accuracy, at the cost of increased computational demand.
Ultimately, the decision on which model to deploy depends on the specific requirements of the task at hand, the computational resources available, and the acceptable margin of error for the application. Further experimentation, including the use of intermediate resolutions or techniques such as transfer learning, may help mitigate some of these trade-offs.